This is the second time I am experiencing this. Last time it got self resolved after a reboot and I really hope it will be the same this time (when I am able to reboot).
Here is the summary:
I create a loop device in a VM, attached it to dom0, then I shut down that VM.
Result: the device still shows:
user@dom0:~ > qvm-device block list dom0
BACKEND:DEVID DESCRIPTION USED BY
dom0:dm-317 Block_Storage: hosted by dom0 foo
dom0:dm-5 Block_Storage: hosted by dom0 qubes dom0-swap
user@dom0:~ > qvm-device block detach dom0 dom0:dm-317
Error: "'block device dm-317 not attached to dom0'"
Even after sudo rm -f /dev/dm-317 it still shows. (AFAIK, this is a known bug, but I have no idea if it is related)
After this, I simply try to start another existing qube (named āstorageā).
Result: it does not start and error messages show:
root@dom0:~ # journalctl -f
May 02 12:03:17 dom0 virtxend[5583]: internal error: libxenlight failed to create new domain 'storage'
May 02 12:03:17 dom0 qubesd[4879]: ERROR: vm.storage: Start failed: internal error: libxenlight failed to create new domain 'storage'
May 02 12:03:17 dom0 dmeventd[3380]: No longer monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:17 dom0 dmeventd[3380]: Monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-239 has been removed.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-239 has been removed.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-251 has been removed.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-251 has been removed.
May 02 12:03:18 dom0 dmeventd[3380]: No longer monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:18 dom0 dmeventd[3380]: Monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:19 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-257 has been removed.
May 02 12:03:19 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-257 has been removed.
root@dom0:~ # tail -f /var/log/libvirt/libxl/libxl-driver.log
2026-05-02 12:03:07.421+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51712 - rc -9
2026-05-02 12:03:07.426+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51728 - rc -9
2026-05-02 12:03:07.430+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51744 - rc -9
2026-05-02 12:03:07.436+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51760 - rc -9
2026-05-02 12:03:07.436+0000: libxl: libxl_create.c:1785:domcreate_launch_dm: Domain 26:unable to add disk devices
2026-05-02 12:03:17.443+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51712 - rc -9
2026-05-02 12:03:17.452+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51728 - rc -9
2026-05-02 12:03:17.455+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51744 - rc -9
2026-05-02 12:03:17.459+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51760 - rc -9
2026-05-02 12:03:17.466+0000: libxl: libxl_domain.c:1585:devices_destroy_cb: Domain 26:libxl__devices_destroy failed
This happens on each attempt to start any qube that is not already running. The ones that are already running work fine.
This doesnāt show a loop device attached to dom0. It shows a device-mapper device originating in dom0 and hence available for attaching to another VM, but not currently attached anywhere. Iām assuming that you created this device-mapper device on top of a loop device attached to dom0. Still, this doesnāt mean the device-mapper device is āattachedā. So thereās nothing to detach in that sense.
Letās say you
qvm-block attached a loop device backed by a VM to dom0, then
mapped a device-mapper device on top of the frontend (e.g. using cryptsetup open /dev/xvdi foo in dom0, so /dev/mapper/foo now symlinks to /dev/dm-317), then
mounted a filesystem from the device-mapper device.
The correct way to unwind this is in reverse order:
umount the filesystem, then
unmap the device-mapper device (e.g. using cryptsetup close foo), and only then
detach the loop device from dom0 (whether explictly using qvm-block detach, or implicitly by shutting down the backend VM hosting the loop device).
But if you just rip out the loop device attachment without unwinding the higher layers first (and especially if a storage pool is stored on top of those higher layers), it will cause problems.
This doesnāt show a loop device attached to dom0.
This is what shows after shutting down the VM containing the loop device. I am showing it to illustrate that even though there is indeed no loop device, āsomethingā still remains.
The correct way to unwind this is in reverse order:
Thatās right but I need to make it somewhat fool-proof as it is part of the new RAM qubes. If the user shuts down his sys-ramdrive or if it gets shut down during e.g. a reboot that should not result in problems.
But if you just rip out the loop device attachment without unwinding the higher layers first (and especially if a storage pool is stored on top of those higher layers), it will cause problems.
That is what I am trying to understand - why does it cause problems and how to avoid them?
Well, qvm-block detach is kind of like unplugging a cable. If Qubes OS tries to access a storage pool hosted on a filesystem mounted from a device-mapper device based on something that has vanished into thin air, probably lots of stuff will fail.
You could try to still unwind the higher layers - e.g. umount, then cryptsetup close. Just like you would do with an accidentally unplugged drive.
Anyway a reboot should fix it, because mounts or device-mapper mappings are not persistent.
Although even if you manage to unwind those higher layers, if VMs are running that were effectively backed by this sys-ramdrive, Iād guess Qubes OS would still behave weirdly e.g. if you now try to shut down or kill those VMs.
Same as if you had physically plugged in a drive (into a port managed by dom0), created a device-mapper mapping on it, and unplugged the drive. The device-mapper mapping would continue to exist (manually removing the device node in /dev/ does not remove the kernel device!), so it would continue to be listed.
It feels un-qubesy to attach bits of other vms to dom0 - and I guess shooting them out from its hands is not well tested.
I didnāt go to read again about the ram qubes, but it sounds like there should be a way to do it without attachmentā¦
Could a custom pool driver do what is wanted? It seems like the API is all set up for volumes on storage outside Dom0, and I am guessing there is no need for any actual blocks to be present there. (Iām going on this)
ā¦orā¦
Would it be possible to distribute the block devices straight to the client qubes? If this is not exactly what the pool API is forā¦
Why even use RAM- or VM-backed storage though, let alone both combined? If you just want anti-forensics, then even without #10827 (i.e. the āproperā approach) it would be so much simpler to create an ephemerally encrypted disk image hosting a throwaway pool in dom0.
Interesting thought. I think it would indeed be possible to write a storage driver like that. But itās not a trivial amount of work, for maintenance as well (to keep up with storage API changes).
It feels un-qubesy to attach bits of other vms to dom0 - and I guess shooting them out from its hands is not well tested.
Other than that, sys-ramdrive (minimal and offline) qube is blind to the contents of its own loop device as the latter is encrypted by dom0. So, the worst thing that VM can do is to stop working (which will result in what I describe in the OP).
Why even use RAM- or VM-backed storage though, let alone both combined?
I asked about that but received no replies, so I just decided to proceed with it.
If you just want anti-forensics, then even without #10827 (i.e. the proper approach) it would be so much simpler to create an ephemerally encrypted disk image hosting a throwaway pool in dom0.
As mentionedearlier the goal of RAM qubes has never been anti-forensincs. The latter is just a positive side effect, as it turned out - welcomed by everyone. So, considering the main goal, RAM-backed storage is a requirement. There are 2 options:
consume dom0ās RAM (as done currently), potentially risking e.g. a dom0 deadlock due to improper tmpfs sizing or any other possible vulnerability e.g. a malicious VM leaving dirty stuff in dom0ās RAM, then dom0 process using that RAM
use domUās RAM
I think the second could be better because it provides more flexibility and isolation. One can simply change the RAM size of sys-ramdrive and does not need to meddle in GRUB to increase dom0ās max memory.
If you have a better idea, I am very open to know about it.
Back on topic:
The observed behaviour seems buggy to me. Removing a device which is not unmounted and detached should not result in such overall system problems, especially considering it is completely unrelated to any qube that wonāt start. Would you agree?
What pool driver is hosted by the ripped out device? If itās lvm_thin, I could imagine that causing problems globally. But with file-reflink I donāt know why it would happen.
People worry about that way to much IMO. Nowadays any SSDs that arenāt total crap (which would be likely to cause more serious issues, e.g. data corruption on power loss) have enormous write endurance listed in their datasheets. Qubes OS could be even more wasteful by orders of magnitude without any problems.
Hmm, I doubt itās this pool thatās affecting the rest of the system then. Maybe the broken device is messing with LVM tooling which is scanning for physical volumes, or something like that? The repeated log messages about the LVM pool look suspicious. (I donāt know how to debug LVM stuff.)
Hmm, I doubt itās the pool thatās affecting the rest of the system then. Maybe the broken device is messing with LVM tooling which is scanning for physical volumes, or something like that? The repeated log messages about the LVM pool look suspicious. (I donāt know how to debug LVM stuff.)
I have no idea how or why it may be related to LVM at all.