ZFS isn't just a filesystem—it's an integrated storage system that combines volume management, snapshots, replication, and data integrity verification in ways that traditional filesystems can't match. Originally from Sun Microsystems and now living in OpenZFS, it provides the kind of data protection that enterprises rely on. Here's how to bring it into your homelab.
Understanding ZFS Pool Types
ZFS organizes storage into pools. The pool type determines how data is distributed and protected:
Stripe (single disk or multiple without redundancy): Maximum capacity, no redundancy. Don't use this for anything you care about.
Mirror: Data written to two or more disks identically. A two-disk mirror tolerates one disk failure; a three-disk mirror tolerates two. Efficient for read operations (reads can be split across disks) but capacity efficiency is 50% with two disks.
RAID-Z1: Single parity, tolerates one drive failure. Effective capacity is (n-1) disks. The minimum is three disks. RAID-Z rebuild times can be long on large drives, during which the pool is vulnerable to a second failure.
RAID-Z2: Double parity, tolerates two drive failures. The minimum is four disks. The additional parity cost is worth it for larger arrays where rebuild times are long—two-drive failure tolerance during a rebuild protects against catastrophic data loss.
Creating Your First Pool
On Ubuntu or Debian, install ZFS: apt install zfsutils-linux. Identify your drives: lsblk. Create a mirror pool:
zpool create -f tank mirror /dev/sdb /dev/sdc
zpool status tank
The pool named "tank" is now available at /tank. The -f flag forces creation even if the drives contain existing data. Double-check your drive identifiers—writing to the wrong drive destroys data.
Datasets: Organization and Quotas
Within a pool, ZFS datasets are like separate filesystems with their own properties. Create datasets for organization:
zfs create tank/backups
zfs create tank/media
zfs create tank/vm-storage
Datasets support quotas and reservations: zfs set quota=500G tank/media limits the media dataset to 500GB. Reservations guarantee space: zfs set reservation=100G tank/vm-storage ensures 100GB is always available for VMs regardless of what else fills the pool.
Snapshots: Time Machine for Your Server
ZFS snapshots are the killer feature. A snapshot captures the pool's state at a moment in time, consuming no additional space initially—only changed blocks are stored separately. Creating a snapshot:
zfs snapshot tank/media@2026-04-13
zfs list -t snapshot
Snapshots are read-only. You can rollback to a snapshot to undo changes, or clone a snapshot to create an editable copy. The space is freed only when the snapshot is deleted.
Automated snapshots via sanoid or zfs-auto-snapshot provide configurable retention: hourly snapshots kept for 48 hours, daily for a month, weekly for a year. Recovery from accidental deletion or corruption takes minutes.
Data Integrity: Checksums and Scrubs
Every block in ZFS has a checksum verified on read. When corruption occurs—bit rot, controller errors, cosmic rays—ZFS detects it and, in redundant configurations, repairs automatically from parity data. The zpool scrub tank command reads through all data and verifies checksums, fixing any detected errors.
Running a scrub monthly (scheduled during low-use periods) ensures data integrity. The scrub reads all disks; on large pools with many disks, scrub can take hours. Run it when the server isn't heavily used, and expect elevated disk activity during the scrub.
Compression: Free Performance
ZFS transparent compression typically improves both storage efficiency and performance. Most data compresses well—text, documents, databases compress 2-4x. Compression reduces I/O by storing fewer bytes, and decompression is faster than the disk reads that would have been needed without compression.
zfs set compression=lz4 tank
zfs get compression,compressratio tank
The LZ4 algorithm provides excellent compression with minimal CPU overhead. Modern CPUs decompress faster than disks read, meaning compression can actually improve apparent performance. Turn it on; the only downside is slightly reduced maximum throughput for data that doesn't compress (already-compressed video, encrypted data).