Following on from: Simple ZFS mirror on Qubes 4.3 - #2 by Euwiiwueir
I went ahead and organized my notes for posterity. This isn’t a replacement for @Rudd-O’s guide, which is super helpful, rather an annotation/overlay. If you’re referring to these notes to make system changes, you should be referring to his guide first.
Here’s what I have differently than the scenario in the guide:
- BIOS boot (coreboot+Heads) instead of EFI
- NVMe disks instead of SATA (so:
/dev/nvme0n1paths instead of/dev/sda, otherwise no different) - Goal of a pool consisting of a mirror vdev rather than a pool of two single device vdevs. Only matters at the end. The process is also the same for raidz1 etc.
- Qubes 4.3 (ZFS software available through
qubes-dom0-update) instead of early 4.2 (manual build). Simplies installation quite a lot.
Fair warning: the process, especially the ZFS migration, is not trivial. If you try it yourself, likely issues will pop up that you will have to troubleshoot.
Warning 2: probably best to do this on a fresh install, before restoring a backup. The qvm-migrate script is fast enough when there are few installed VMs but quite slow if there are many.
Part 0:
I installed Qubes 4.3 onto one of my two SSDs. Normal process.
For clarity I will call the installation SSD “Foo” and the other drive “Bar”. I can’t refer to these as /dev/nvme0n1 and /dev/nvme1n1 because the dev assignments can switch across boots.
Part 1: How to install ZFS on Qubes OS — Rudd-O.com
You can read through the link for past context but in 4.3 you only need to run the commands below.
sudo qubes-dom0-update zfs # Note: also installs zfs-dkms as a dependency
# Warning: installs a LOT of packages to dom0.
# For /boot, and later maybe also dom0 root:
sudo qubes-dom0-update rsync
# If you do plan to migrate dom0 to ZFS, then:
sudo qubes-dom0-update zfs-dracut
Part 2: How to store your Qubes OS qubes in a ZFS pool — Rudd-O.com
I used blivet-gui to partition Bar such that there are /dev/nvme1n1p1 and /dev/nvme1n1p2 of size and format to match the partitions the Qubes installer created on Foo, /dev/nvme1n0p1 (1MB BIOS boot) and /dev/nvme1n0p2 (the /boot mountpoint). And the rest of Bar as a large regular partition (not LVM2, unlike Foo), with Encrypt checked to LUKS-ify it. Use the same encryption password you did during Part 0.
(I skip the cryptsetup luksFormat and cryptsetup luksOpen commands from the guide, as blivet-gui already did them for me.)
Run the following commands as root. (Ditto for the rest of the post, I think)
Synchronize Bar’s boot to Foo’s boot:
mkdir -p /tmp/x
mount /dev/nvme1n1p2 /tmp/x
rsync -vaxHAXSP /boot/ /tmp/x
umount /tmp/x
- Note: skipping the
/boot/efi/stuff from the guide because it’s not required for a BIOS boot system
Set Bar’s new, empty encrypted partition to be decrypted at boot:
dev=`blkid -s UUID -o value /dev/nvme1n1p3`
echo luks-$dev UUID=$dev none discard >> /etc/crypttab
Create the ZFS pool on Bar’s new, empty encrypted partition:
zpool create -O atime=off -O relatime=off laptop /dev/disk/by-id/dm-uuid-*-luks-`blkid -s UUID -o value /dev/nvme1n1p3`
- I am using a pool name of “laptop”, as in Rudd-O’s guide. In other documentation online you’ll often see the pool be named “tank”.
- I’m disabling
atimeandrelatimeas a minor optimization. Not required.
zfs set mountpoint=none laptop
zpool status
See that everything is fine so far, the pool exists.
Create the new Qubes pool, backed by the ZFS pool:
qvm-pool add -o container=laptop/vm-pool vm-pool-zfs zfs
- This creates a toplevel ZFS dataset on the ZFS pool
laptopnamedvm-pool. Verify withzfs list. - Note: the Qubes pool is named
vm-pool-zfswhile the backing ZFS dataset is namedvm-pool. This makes sense to me but they can be named as you like.
Check your work:
qvm-pool list
qvm-pool info vm-pool-zfs
Set this pool to be the default pool for new qubes:
qubes-prefs default_pool vm-pool-zfs
# Note: also sets:
# default_pool_private D vm-pool-zfs
# default_pool_root D vm-pool-zfs
# default_pool_volatile D vm-pool-zfs
Use Rudd-O’s qvm-migrate script to migrate all installed VMs (the underlying volumes) from the old default vm-pool to vm-pool-zfs. The script is still ok for 4.3 and it helps a lot; but, this is still error-prone, manual work. Lessons:
- Before you start, shut down everything except
sys-usb(and dom0). If you don’t needsys-usbfor keyboard, then shut it down too. - Everything that touches
sys-usbis dicey (if you need it for input); you need it to be shutdown while you migrate 1) its template, 2) itsdefault_dispvm, and 3)sys-usbitself. - When you’re done, check
qvm-pool listto make sure all volumes are actually migrated. For me,sys-usb’s volatile volume did not transfer to the new pool on my first attempt, not sure why, but probably user error.
I did something like this when migrating sys-usb:
# qvm-shutdown --wait --force sys-usb ; ./m.bash sys-usb ; qvm-start sys-usb ; qvm-start sys-usb-zfs
- (The
qvm-start sys-usb-zfsat the end just in case something goes wrong half way)
Here was my m.bash:
#!/bin/bash
set -e
set -x
vm="$1"
qvm-check -q "$vm"
qvm-check -q --running "$vm" && exit 1
~/qvm-migrate.bash $vm ${vm}-zfs
~/qvm-migrate.bash ${vm}-zfs $vm
(sudo zfs destroy -r laptop/vm-pool/${vm}-zfs || true)
qvm-service --disable $vm qubes-update-check
- Probably best to not include the
qvm-run -a -p --nogui $1 'sudo fstrim -v / ; sudo fstrim -v /rw'logic from the Optional section of the guide, as starting a qube can cause a chain of networking VMs to start up, and now you’re playing whack-a-mole.
Start a few qubes and/or reboot now just to see if the system is currently sound, which it should be.
Then, delete the old Qubes pool:
qvm-pool remove vm-pool
As things stand:
- Foo: used for booting and for dom0
- Bar: used to host the ZFS pool backing the Qubes pool for all VMs except dom0
Part 3: How to pivot your Qubes OS system entirely to a root-on-ZFS setup — Rudd-O.com
Install zfs-dracut if you didn’t install it before:
sudo qubes-dom0-update zfs-dracut
Probably should install Rudd-O’s grub-zfs-fixer package too. There are many versions of the grub-zfs-fixer package listed on Rudd-O’s index: Index of /q4.1/packages/ . I downloaded the one with the most recent version, grub-zfs-fixer-0.0.7-23.fc32.noarch.rpm, and the one with the most recent timestamp, grub-zfs-fixer-0.0.7-19.fc32.noarch.rpm. The extracted RPMs are functionally identical. This package adds small innocuous patches to three files:
/usr/sbin/grub2-mkconfig/etc/grub.d/10_linux/etc/grub.d/20_linux_xen
Without the patch to grub2-mkconfig, I got an error later on that I will point out.
Create a toplevel dataset for dom0 root in the ZFS pool on Bar:
zfs create -o mountpoint=/laptop -p laptop/ROOT/os
- The name “
ROOT” seems to be the convention for the parent dataset of OS root filesystems - “
/os” could instead be/qubes-os, or whatever you like, as long as you’re consistent
Create swap:
zfs create -o primarycache=metadata -o compression=zle \
-o logbias=throughput -o sync=always \
-o secondarycache=none \
-o com.sun:auto-snapshot=false -V 8G laptop/dom0-swap
mkswap /dev/zvol/laptop/dom0-swap
- Warning: swap on ZFS is known to be risky under memory contention. Caveat emptor.
Sync to the live root (!):
rsync -vaxHAXSP / /laptop/ ; rmdir /laptop/laptop
Make fstab edits for the new /boot, /, and swap on Bar, as in the guide:
vim /laptop/etc/fstab
- Note: I skip the
/boot/efi/stuff because it’s not required for a BIOS boot system
Mounting and GRUB stuff:
mount --bind /dev /laptop/dev
mount --bind /proc /laptop/proc
mount --bind /sys /laptop/sys
mount --bind /run /laptop/run
mount /dev/nvme0n1p2 /laptop/boot
chroot /laptop grub2-mkconfig -o /boot/grub2/grub.cfg
- Note: skipping the EFI stuff
Here I got an error:
/usr/sbin/grub2-probe: error: cannot find a device for / (is /dev mounted?).
This is what grub-zfs-fixer rectifies. If you get this error, install the package and rerun chroot /laptop grub2-mkconfig -o /boot/grub2/grub.cfg.
Tweak grub.cfg per the guide:
vim /laptop/boot/grub2/grub.cfg
- Note: the guide says “Ensure the root option says
root=laptop/ROOT/os”. However, for me it showedroot=ZFS=laptop/ROOT/os. I kept theZFS=and that worked, so maybe there is a typo in the guide here, or maybe it works either way.
Remove or comment out the old luks-* entry in crypttab that was created during installation (Part 0).
vim /laptop/etc/crypttab
Recreate the RAM disk:
chroot /laptop dracut -fv --regenerate-all
For me there was error output here:
...
dracut[I]: *** Including module: zfs ***
dracut-install: Failed to find module 'zfs'
dracut[E]: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.SPCl6s/initramfs --kerneldir /lib/modules/6.12.59-1.qubes.fc41.x86_64/ -m zfs
...
dracut[I]: *** Including module: zfs ***
dracut-install: Failed to find module 'zfs'
dracut[E]: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.CTIOsr/initramfs --kerneldir /lib/modules/6.12.63-1.qubes.fc41.x86_64/ -m zfs
...
However, it does find the zfs module for my running kernel and did produce /boot/6.17.9-1.qubes.fc41.x86_64.img.
I skipped the efibootmgr stuff, because BIOS boot.
Now, reboot. Hopefully the computer will choose to boot from Bar instead of Foo. You will know if it chose the ‘wrong’ disk because after logging in you will have no qubes except dom0. If so, reboot again and maybe choose the boot drive manually.
Follow the export/import commands in ‘Unlock the system and set the root file system mount point’. Then double check the right things are mounted:
mount | grep zfs
mount | grep /boot
Create the GRUB config once again:
grub2-mkconfig -o /boot/grub2/grub.cfg
And reboot, and hopefully everything happens normally.
For me it was normal. At this point I used blivet-gui to delete the /boot partition on Foo, to avoid confusion later, and created a new empty/filler partition of the same size/extent in its place.
The next and nearly last step is to repurpose the rest of Foo to create the mirror for the ZFS pool currently hosted only on Bar.
I used blivet-gui to delete the old LUKS+LVM2 partition on Foo, and created a new partition in the same space, unformatted, Encrypted, same password as before. This partition should have the same (or greater) size/extent as the the LUKS partition on Bar. (As before, I skip the cryptsetup luksFormat and cryptsetup luksOpen commands from the guide, as blivet-gui already did them for me.)
Add a new entry to crypttab for this encrypted disk:
vim /etc/crypttab
Before creating the mirror, check the status of the pool:
user@dom0:~$ zpool status
pool: laptop
state: ONLINE
...
config:
NAME STATE READ WRITE CKSUM
laptop ONLINE 0 0 0
dm-uuid-CRYPT-LUKS2-blahBARblah ONLINE 0 0 0
errors: No known data errors
Magically create the mirror by attaching the new LUKS partition to the single device vdev:
zpool attach laptop /dev/disk/by-id/<device path for Bar's LUKS partition> /dev/disk/by-id/<device path for Foo's LUKS partition>
See that laptop consists of a mirror vdev now, and resilvering is in process:
user@dom0:~$ zpool status
pool: laptop
state: ONLINE
...
config:
NAME STATE READ WRITE CKSUM
laptop ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
dm-uuid-CRYPT-LUKS2-blahBARblah ONLINE 0 0 0
dm-uuid-CRYPT-LUKS2-blahFOOblah ONLINE 0 0 0 (resilvering)
errors: No known data errors
Recreate the RAM disk:
dracut -fv --regenerate-all
Add the new Foo LUKS partition to GRUB_CMDLINE_LINUX, as in the guide:
vim /etc/defaults/grub
Create the GRUB config a last time:
grub2-mkconfig -o /boot/grub2/grub.cfg
Once the resilvering is done, reboot to see all is well ![]()
Epilogue
Do you see something like this (I did)?
user@dom0:~$ zpool status
pool: laptop
state: ONLINE
status: Mismatch between pool hostid and system hostid on imported pool.
This pool was previously imported into a system with a different hostid,
and then was verbatim imported into this system.
action: Export this pool on all systems on which it is imported.
Then import it to correct the mismatch.
...
Fix it by doing this:
root# zgenhostid "$(hostid)"