xel.sh | ZFS

ZFS Basics & Terminology

Zpools

A zfs pool (zpool for short) is the top level virtual device (essentially a partition) that usually contains the entire space of a collection of drives. For all of the information in this blog post, I will be assuming you are using the entire space of the drives in your zpools. A zpool often combines multiple drives of the same size into a single logical partition on your system.

Datasets & Properties

Datasets are partitions that appear as normal folders inside of a zpool. Most zfs operations can be performed on datasets rather than pools, allowing for more selective customization and easier managment of data. You can have datasets inside of datasets.

Datasets and pools have properties, which are options that control behavior; such as encryption, file attributes, compression, etc. A lot of the most powerful properties are applied on datasets rather than pools, which gives even more reason to make many specific datasets for unique sets of data.

Datasets have a system of inherited properties, meaning that any dataset created underneath another dataset will inherit all of the properties of its parent dataset.

You can get your current dataset's properties (including defaults) with this command:

sudo zfs get all pool/dataset

replacing pool and dataset with the name of your pool and the name of your dataset.

More information datasets and their properties can be found here

Layouts

Zpools can be layed out in a variety of different ways; if you are setting up ZFS on a single disk than you can only choose a striped configuration.

Most of the layouts revolve around the core concept of data redundency which is not exclusive to ZFS. Essentially you sacrifice a ratio of your total drive space in your collection of drives to allow full data recovery in the event of a hard drive failure. There are three core layouts for zfs pools.

Striped

If you are only using a single disk in your zpool, this is your only option. It provides no redundency but does not sacrifice any of your total drive space. A zpool with multiple drives in a striped layout will split all writes across each drive evenly, resulting in potential performance benefits but is incredibly risky. Each drive you add to a striped zpool increases the chances of complete data loss since a single drive failure will corrupt all data across all the disks in the pool.

Mirrored

This is the simplest layout, it provides redundency up to n/2 drive failures (with n being the total number of disks in the zpool) while also sacrificing half of the total drive space. Simply put, everytime you write to the pool you simultaneously write all of that data onto all the disks in the pool, creating a perfect mirror.

Raidz (raidz1)

This is similiar to RAID-5. It provides redundency for 1 drive while sacrificing one drive worth of space. For example if you have three 10TB drives in a raidz zpool, you will have 20TB of usable space and can recover from a single drive failure. The concept behind raidz and raid-5 is essentially striping the disks but also keeping track of parity bits stored on a single disk.

Creating a mirrored zpool with encrypted datasets

Now that you know the basics of ZFS we can begin creating our first pool.

Most (if not all) zfs commands require root permissions, so either use sudo or a root elevated shell. I will include sudo in the commands in this guide since the majority of readers will probably be using it instead of a root shell.

Creating the pool

To create the pool you first have to get the ids of the disks you are planning to use.

sudo fdisk -l

grab the name of the disks, such as /dev/sda, /dev/sdf, etc. then run:

ls -lah /dev/disk/by-id/

Get the ids for each disk you want, the id is the value before the "->"

wwn-0x5002538e00117bb6 -> ../../sdc

Once you have collected all of the ids of the disks you want to use, put them in a command like so, replacing "disk1", "disk2", etc. with the ids of your disks

sudo zfs create -f -o ashift=12 -m /mnt/dir poolname mirror disk1 disk2

-f will force using vdevs

-o ashift=12 will force 4096 byte sectors (this is to prevent an issue where adding new drives to a pool of the correct size may not work since the new drive's sector sizes aren't exactly the same)

-m will set the mountpoint for the pool (the path where the directory representing your pool will live), followed by the name of the pool you want to use.

Creating your encrypted datasets

It's easiest to have one top-level dataset where you set your core properties you want for all of your sub-datasets to inherit, in this guide those will be encryption, compression, and relatime.

First we have to generate a "wrapping key" to decrypt the actual encryption keys to the dataset. We can easily accomplish this by using dd, first make a directory /etc/keys then run:

sudo dd if=/dev/random of=/etc/keys/zfs.key bs=512 count=1

Then run the command to create the dataset using our newly created key. Replace "pool" and "tank" with your pool name and your desired dataset name respectively.

Encryption will encrypt the dataset, relatime will make the access time attribute not be updated as often for files which will save on writes, and compression will use zstd lossless compression to compress files that zfs determines are worth compressing, usually small files like text files.

sudo zfs create pool/tank -o encryption=on -o relatime=on -o compression=zstd 
            -o keylocation=file:///etc/keys/zfs.key -o keyformat=passphrase

From here on out you can create as many sub-datasets as you want, and they will all inherit the encryption, relatime, and compression properties. We can create a sub-dataset named "media" like so:

sudo zfs create pool/tank/media

These datasets will only be accessable by root at first, so you will probably want to make your user own them. Go inside the directory of your pool's mountpoint and run the below command replacing "username" with the name of the user you want to use the zpool.

sudo chown -R username tank

Testing your keys & mounting

To ensure you properly setup your keys you can use the -n flag for dry-run.

sudo zfs load-key -n -L file:///etc/keys/zfs.key pool/tank

If the key was successfully verified you can try and reboot and attempt to decrypt it. Upon boot the encrypted datasets will most likely not automatically mount, so you should run these two commands to decrypt and mount your datasets.

sudo zfs load-key -L file:///etc/keys/zfs.key pool/tank && sudo zfs mount -a

the -a flag on mount will mount all zfs datasets, you can specify specific ones instead if you wish.

You can either manually type this on each startup, or create a script to do it for you.

Creating other types of pools

Since you now know the general process of creating pools and datasets, using different layout types should be fairly straightforward. The only thing needing to be changed is the zfs create command from earlier to create the initial pool.

Striped is the default, so you don't have to add any specifier and just list the disks

sudo zfs create ... pool disk1 disk2

Raidz will have options "raidz", "raidz2", and "raidz3" for raidz1, 2, and 3 respectively.

sudo zfs create ... pool raidz disk1 disk2 disk3

Other Resources

Incase you couldn't find what you needed from this guide, or you just wanted to read more about ZFS, here are two amazing resources to do so

FreeBSD handbook

Oracle ZFS administration guide