ZFS/Qubes Networking incompatibility?

A few months ago in the qubes zfs thread (ZFS in Qubes OS - #9 by SteveC), I reported:

I didn’t get much of a response but that was my fault for burying it as part of a list of side-issues in a post about using zfs on qubes.

Well, now I know a LITTLE bit more about the zfs/qubes-networking incompatibility and it’s still puzzling.

If the qube that has both of these installed has no network vm, it starts up fine. You can then connect a network vm and things will work properly.

But if the network vm is set on qube startup, it freezes up for about sixty seconds then dies, claiming it cannot connect to the qrexec agent for 60 seconds. The log file referenced in the error popup appears to be effectively identical (timestamps, data rates, and uuid differ) in both cases, except that the one that succeeds in starting up ends with an announcement of the qube name and a login prompt. There’s no error message in that log file explaining that it didn’t connect to qrexec, or why.

So I have a very unsatisfactory workaround: don’t connect the sys-firewall qube as a network vm to any qube that has zfs capability, until after the zfs-capable qube has started. At this point I’d rather just not have zfs capability on those qubes. (I can still loop zfs blocks on those qubes and mount them on a zfs-capable qube with no networking and read them–which is what I normally do anyway.)

Packages: for zfs: zfs-zed and zfsutils-linux. For qubes-networking: qubes-core-agent-networking.

1 Like

You can increase the qube’s qrexec_timeout and connect to its console using qvm-console-dispvm, maybe you’ll be able to access the console to see what failed to load.

I guess I don’t have any idea how to use qvm-console-dispvm. It opens a completely blank window I can type in, but nothing I type gets a reaction.

However, upping the qrexec and shutdown delays to 300 seconds revealed that the qube WILL start, after a bit over two minutes.

I have no idea what to look at to try to figure out what went on other than /var/log/xen/console.

/var/log/xen/console/guest-my-new-qube.log has a message that apparently (it’s hard to read because someone stuffed the file full of color escape sequences) systemd-ud didn’t complete device initialization. Then on the next lines there are errors installing the zfs kernel module and importing zfs pools–these messages are output after the two minute delay.

Those three messages do not show up in the log for a qube that has zfs but no qubes-core-agent-networking installed.

Since your qube can start with increased qrexec_timeout, then you should be able to check the properly formatted log inside the qube.

You can also use systemd-analyze to check which service was impeding the boot:

sudo systemd-analyze critical-chain

or

sudo systemd-analyze plot > startup_order.svg

And open startup_order.svg in browser.

I’m sorry. I’m ignorant of how to check “the properly formatted log”.

What file is it?

However, running systemd-analyze critical-chain on the qube shows that the delay seems to have been in local-fs-pre.target It started at 321 ms; the next command run-credentials-systemd\x2dtmpfiles\x2dsetup.service.mount1 seems to have started at 2min 676ms.

This is ultimately based off of a mimimal template, by the way.

Run this command in the qube’s terminal to view the current boot log:

sudo journalctl -b

Maybe it can’t mount swap?

Steps to Reproduce:

start with debian-12-minimal; and clone it to d12min.

d12min must have a debian kernel, otherwise zfs won’t install at all.

qvm-run --pass-io --user=root d12min 'apt install linux-image-amd64 linux-headers-amd64 grub2 qubes-kernel-vm-support'
qvm-run --pass-io --user=root d12min 'grub-install /dev/xvda'
qvm-prefs d12min kernal pvgrub2-pvh

then install qubes-networking agent and zfs on the new template.

qvm-run -u root --pass-io d12min 'apt instal qubes-core-agent-networking zfs-zed zfsutils-linux'

Use qvm-prefs to set the qrexec-timeout and startup-timeout to 300 seconds.

create an appvm based on this qube; call it zfs-network-test

start zfs-network-test.

Checking the log file (thanks for the command!) one minute in the following is reported (note, I’m typing this in whilst reading it in the terminal; typos more than likely):

localhost (udev-worker)[320]: eth0: Spawned process '.usr/bin/systemctl restart --job-mode=replace qubes-network-upling@eth0.service' [629] is taking longer than 59s to complete'
systemd=udevd[276]: eth0: Worker [320] processing SEQNUM=1958 is taking a long time

a minute later:

localhost udevadm[267]: Timed out for waiting the udev queue being empty.
localhost systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
localhost systemd[1]: systemd-udev-settle.service: Failed with result 'exit-code'.
localhost systemd[1]: Failed to start systemd-udev-settle.service - Wait for udev To Complete Device Initialization.
localhost systemd[1]: Dependency failed for zfs-load-module.servie - Install ZFS kernal module.
localhost systemd[1]: Dependecy failed for zfs-import-cache.service - Import ZFS pools by cache file.
localhost systemd[1]: zfs-import-cache.service: Job zfs-import-cache.service/start failed with result 'dependency'.
localhost systemd[1]: zfs-load-module.service: Job zfs-load-module.service/start failed with result 'dependency'.
localhost systemd[1]: Reached target zfs-import.target - ZFS pool import target.
localhost systemd[1]: zfs-mount.service - Mount ZFS filesystems was skipped because of an unmet condition check (ConditionPathIsDirectory=/sys/module/zfs).

After this it looks routine, save for further complaints from ZFS.

You can copy the text from xterm like this:

  • Ctrl+Middle mouse button on the xterm window → Select to Clipboard
  • Select the text you want to copy with mouse
  • Press Ctrl+Shift+C to copy to global clipboard

Check the output of this command:

sudo systemctl status qubes-network-uplink@eth0.service | cat

SMDH…I didn’t realize it was a domU to domU copy. For some reason I was thinking dom0 to domU.

● qubes-network-uplink@eth0.service - Qubes network uplink (eth0) setup
     Loaded: loaded (/lib/systemd/system/qubes-network-uplink@.service; static)
     Active: active (exited) since Tue 2025-09-09 22:16:08 MDT; 1h 25min ago
    Process: 816 ExecStart=/usr/lib/qubes/setup-ip add eth0 (code=exited, status=0/SUCCESS)
   Main PID: 816 (code=exited, status=0/SUCCESS)
        CPU: 20ms

Sep 09 22:16:08 my-new-qube systemd[1]: Starting qubes-network-uplink@eth0.service - Qubes network uplink (eth0) setup...
Sep 09 22:16:08 my-new-qube systemd[1]: Finished qubes-network-uplink@eth0.service - Qubes network uplink (eth0) setup.

Those timestamps, by the way, are a bit more than two minutes after initial startup at 22:14:04

It seems to be a problem with zfs systemd service requiring systemd-udev-settle.service:

qubes-network-uplink.service requires network-pre.target to be reached, which requires local-fs.target to be reached.
Zfs service zfs-mount is set to run before local-fs.target.
But zfs-mount needs other zfs services (zfs-import-cache zfs-import-scan zfs-load-module) to load that require systemd-udev-settle.service to start, which can’t start before qubes-network-uplink.service is finished loading.
So it’s a dependency mess.

No idea how to solve it properly. except for overriding the zfs systemd units and removing the Requires=systemd-udev-settle.service from them:

find /lib/systemd/system/zfs* -type f -exec grep -q '^Requires=systemd-udev-settle.service' {} \; -exec cp -t /etc/systemd/system {} +
find /etc/systemd/system/zfs* -type f -exec sed -i "/Requires=systemd-udev-settle.service/s/^#*/#/" {} \;
1 Like

This appears to work…at least when I ran those commands (with ‘sudo’) in the template, shut down the template and restarted the AppVM it came right up, and the network was working (was able to ping something).

Thanks! Systemd is an impenetrable black box to me.

EDIT: no, this is insufficient. Not sure why it “worked” the first time I tried it but I can’t get it to work now.

I also tried out the corresponding “After=” line

1 Like

Do you have the same errors?
Check that zfs service files exist in /etc/systemd/system/ and they have Requires=systemd-udev-settle.service commented out inside.

AAARGH how embarassing.

Apparently when I updated my salt file both commands were on /lib/systemd/system/zfs*

The second one should have been on /etc/systemd/system/zfs*. I am thinking I must have manually entered the command properly (which is why it worked “the first time”) then botched the salt file.

Salt now running (it will take time); I expect it will work this time; if so I will re-check your comment as the solution.

EDIT: appears to work; thanks again.

1 Like