Can't start any qube. virtxend[5583]: internal error: libxenlight failed to create new domain

qubist · May 2, 2026, 12:29pm

Hi,

This is the second time I am experiencing this. Last time it got self resolved after a reboot and I really hope it will be the same this time (when I am able to reboot).

Here is the summary:

I create a loop device in a VM, attached it to dom0, then I shut down that VM.

Result: the device still shows:

user@dom0:~ > qvm-device block list dom0
BACKEND:DEVID  DESCRIPTION                                    USED BY
dom0:dm-317    Block_Storage: hosted by dom0 foo         
dom0:dm-5      Block_Storage: hosted by dom0 qubes dom0-swap  

user@dom0:~ > qvm-device block detach dom0 dom0:dm-317
Error: "'block device dm-317 not attached to dom0'"

Even after sudo rm -f /dev/dm-317 it still shows. (AFAIK, this is a known bug, but I have no idea if it is related)

After this, I simply try to start another existing qube (named ‘storage’).
Result: it does not start and error messages show:

root@dom0:~ # journalctl -f

May 02 12:03:17 dom0 virtxend[5583]: internal error: libxenlight failed to create new domain 'storage'
May 02 12:03:17 dom0 qubesd[4879]: ERROR: vm.storage: Start failed: internal error: libxenlight failed to create new domain 'storage'
May 02 12:03:17 dom0 dmeventd[3380]: No longer monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:17 dom0 dmeventd[3380]: Monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-239 has been removed.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-239 has been removed.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-251 has been removed.
May 02 12:03:18 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-251 has been removed.
May 02 12:03:18 dom0 dmeventd[3380]: No longer monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:18 dom0 dmeventd[3380]: Monitoring thin pool qubes_dom0-vm--pool-tpool.
May 02 12:03:19 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-257 has been removed.
May 02 12:03:19 dom0 systemd-homed[4567]: block device /sys/devices/virtual/block/dm-257 has been removed.

root@dom0:~ # tail -f /var/log/libvirt/libxl/libxl-driver.log

2026-05-02 12:03:07.421+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51712 - rc -9
2026-05-02 12:03:07.426+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51728 - rc -9
2026-05-02 12:03:07.430+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51744 - rc -9
2026-05-02 12:03:07.436+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to add device with path /local/domain/0/backend/vbd/26/51760 - rc -9
2026-05-02 12:03:07.436+0000: libxl: libxl_create.c:1785:domcreate_launch_dm: Domain 26:unable to add disk devices
2026-05-02 12:03:17.443+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51712 - rc -9
2026-05-02 12:03:17.452+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51728 - rc -9
2026-05-02 12:03:17.455+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51744 - rc -9
2026-05-02 12:03:17.459+0000: libxl: libxl_device.c:1224:device_backend_callback: Domain 26:unable to remove device with path /local/domain/0/backend/vbd/26/51760 - rc -9
2026-05-02 12:03:17.466+0000: libxl: libxl_domain.c:1585:devices_destroy_cb: Domain 26:libxl__devices_destroy failed

This happens on each attempt to start any qube that is not already running. The ones that are already running work fine.

Why is this happening and how to fix it?

rustybird · May 2, 2026, 1:49pm

qubist:

user@dom0:~ > qvm-device block list dom0
BACKEND:DEVID  DESCRIPTION                                    USED BY
dom0:dm-317    Block_Storage: hosted by dom0 foo         
dom0:dm-5      Block_Storage: hosted by dom0 qubes dom0-swap  

user@dom0:~ > qvm-device block detach dom0 dom0:dm-317
Error: "'block device dm-317 not attached to dom0'"

This doesn’t show a loop device attached to dom0. It shows a device-mapper device originating in dom0 and hence available for attaching to another VM, but not currently attached anywhere. I’m assuming that you created this device-mapper device on top of a loop device attached to dom0. Still, this doesn’t mean the device-mapper device is “attached”. So there’s nothing to detach in that sense.

Let’s say you

qvm-block attached a loop device backed by a VM to dom0, then
mapped a device-mapper device on top of the frontend (e.g. using cryptsetup open /dev/xvdi foo in dom0, so /dev/mapper/foo now symlinks to /dev/dm-317), then
mounted a filesystem from the device-mapper device.

The correct way to unwind this is in reverse order:

umount the filesystem, then
unmap the device-mapper device (e.g. using cryptsetup close foo), and only then
detach the loop device from dom0 (whether explictly using qvm-block detach, or implicitly by shutting down the backend VM hosting the loop device).

But if you just rip out the loop device attachment without unwinding the higher layers first (and especially if a storage pool is stored on top of those higher layers), it will cause problems.

qubist · May 2, 2026, 2:39pm

@rustybird

Thanks for the quick response.

This doesn’t show a loop device attached to dom0.

This is what shows after shutting down the VM containing the loop device. I am showing it to illustrate that even though there is indeed no loop device, “something” still remains.

The correct way to unwind this is in reverse order:

That’s right but I need to make it somewhat fool-proof as it is part of the new RAM qubes. If the user shuts down his sys-ramdrive or if it gets shut down during e.g. a reboot that should not result in problems.

But if you just rip out the loop device attachment without unwinding the higher layers first (and especially if a storage pool is stored on top of those higher layers), it will cause problems.

That is what I am trying to understand - why does it cause problems and how to avoid them?

rustybird · May 2, 2026, 2:48pm

Well, qvm-block detach is kind of like unplugging a cable. If Qubes OS tries to access a storage pool hosted on a filesystem mounted from a device-mapper device based on something that has vanished into thin air, probably lots of stuff will fail.

You could try to still unwind the higher layers - e.g. umount, then cryptsetup close. Just like you would do with an accidentally unplugged drive.

Anyway a reboot should fix it, because mounts or device-mapper mappings are not persistent.

rustybird · May 2, 2026, 3:06pm

Although even if you manage to unwind those higher layers, if VMs are running that were effectively backed by this sys-ramdrive, I’d guess Qubes OS would still behave weirdly e.g. if you now try to shut down or kill those VMs.

rustybird · May 2, 2026, 3:18pm

Same as if you had physically plugged in a drive (into a port managed by dom0), created a device-mapper mapping on it, and unplugged the drive. The device-mapper mapping would continue to exist (manually removing the device node in /dev/ does not remove the kernel device!), so it would continue to be listed.

phceac · May 2, 2026, 3:43pm

It feels un-qubesy to attach bits of other vms to dom0 - and I guess shooting them out from its hands is not well tested.

I didn’t go to read again about the ram qubes, but it sounds like there should be a way to do it without attachment…

Could a custom pool driver do what is wanted? It seems like the API is all set up for volumes on storage outside Dom0, and I am guessing there is no need for any actual blocks to be present there. (I’m going on this)

…or…

Would it be possible to distribute the block devices straight to the client qubes? If this is not exactly what the pool API is for…

qubist · May 2, 2026, 6:01pm

@rustybird

Thanks for the feedback. I guess this needs to come with a warning to the user. At least I don’t see another way.

rustybird · May 2, 2026, 6:35pm

Why even use RAM- or VM-backed storage though, let alone both combined? If you just want anti-forensics, then even without #10827 (i.e. the “proper” approach) it would be so much simpler to create an ephemerally encrypted disk image hosting a throwaway pool in dom0.

Interesting thought. I think it would indeed be possible to write a storage driver like that. But it’s not a trivial amount of work, for maintenance as well (to keep up with storage API changes).

qubist · May 2, 2026, 6:37pm

@phceac

It feels un-qubesy to attach bits of other vms to dom0 - and I guess shooting them out from its hands is not well tested.

github.com/QubesOS/qubes-issues

Make it more difficult to attach domU devices to dom0

opened 08:15AM - 03 Apr 26 UTC

emanruse

C: core ux security good first issue P: default pr submitted

### Qubes OS release 4.3.0 ### Brief summary Although it is not possible to… attach a device to dom0 from qui-devices, it is possible to do it from command line. ### Steps to reproduce ``` root@disp624:/tmp/foo # truncate -s 2G file.img root@disp624:/tmp/foo # mkfs.ext4 file.img >/dev/null 2>&1 root@disp624:/tmp/foo # losetup -f file.img root@disp624:/tmp/foo # user@dom0:~ > qvm-device block list BACKEND:DEVID DESCRIPTION USED BY dom0:dm-5 Block_Storage: hosted by dom0 qubes dom0-swap disp624:loop0 Block_Storage: hosted by disp624 /tmp/foo/file.img user@dom0:~ > qvm-device block attach dom0 disp624:loop0 user@dom0:~ > qvm-device block list BACKEND:DEVID DESCRIPTION USED BY dom0:dm-5 Block_Storage: hosted by dom0 qubes dom0-swap disp624:loop0 Block_Storage: hosted by disp624 /tmp/foo/file.img dom0 (attached: read-only=no, frontend-dev=xvdi) ``` ### Expected behavior Attaching a device this way should be impossible by default, just like it is not possible through gui-devices. ### Actual behavior In STR.

Other than that, sys-ramdrive (minimal and offline) qube is blind to the contents of its own loop device as the latter is encrypted by dom0. So, the worst thing that VM can do is to stop working (which will result in what I describe in the OP).

qubist · May 3, 2026, 7:22am

Why even use RAM- or VM-backed storage though, let alone both combined?

I asked about that but received no replies, so I just decided to proceed with it.

If you just want anti-forensics, then even without #10827 (i.e. the proper approach) it would be so much simpler to create an ephemerally encrypted disk image hosting a throwaway pool in dom0.

As mentioned earlier the goal of RAM qubes has never been anti-forensincs. The latter is just a positive side effect, as it turned out - welcomed by everyone. So, considering the main goal, RAM-backed storage is a requirement. There are 2 options:

consume dom0’s RAM (as done currently), potentially risking e.g. a dom0 deadlock due to improper tmpfs sizing or any other possible vulnerability e.g. a malicious VM leaving dirty stuff in dom0’s RAM, then dom0 process using that RAM
use domU’s RAM

I think the second could be better because it provides more flexibility and isolation. One can simply change the RAM size of sys-ramdrive and does not need to meddle in GRUB to increase dom0’s max memory.

If you have a better idea, I am very open to know about it.

Back on topic:

The observed behaviour seems buggy to me. Removing a device which is not unmounted and detached should not result in such overall system problems, especially considering it is completely unrelated to any qube that won’t start. Would you agree?

rustybird · May 3, 2026, 8:31am

Oh. Performance, then?

What pool driver is hosted by the ripped out device? If it’s lvm_thin, I could imagine that causing problems globally. But with file-reflink I don’t know why it would happen.

qubist · May 3, 2026, 8:58am

@rustybird

Oh. Performance, then?

The main goal is reduction of SSD writes as disposables in particular are quite wasteful in regards to that.

What pool driver is hosted by the ripped out device?

file-reflink

rustybird · May 3, 2026, 9:35am

People worry about that way to much IMO. Nowadays any SSDs that aren’t total crap (which would be likely to cause more serious issues, e.g. data corruption on power loss) have enormous write endurance listed in their datasheets. Qubes OS could be even more wasteful by orders of magnitude without any problems.

Hmm, I doubt it’s this pool that’s affecting the rest of the system then. Maybe the broken device is messing with LVM tooling which is scanning for physical volumes, or something like that? The repeated log messages about the LVM pool look suspicious. (I don’t know how to debug LVM stuff.)

qubist · May 3, 2026, 11:22am

Hmm, I doubt it’s the pool that’s affecting the rest of the system then. Maybe the broken device is messing with LVM tooling which is scanning for physical volumes, or something like that? The repeated log messages about the LVM pool look suspicious. (I don’t know how to debug LVM stuff.)

I have no idea how or why it may be related to LVM at all.