Boot from degraded RAID-1 (LVM issue: vg0 renamed to vg01)

Max123 · September 18, 2025, 8:59am

Hi all,

One of my SSDs in RAID-1 (mirror) has died and I’m trying to boot.
The dead drive was the primary one (with EFI & boot partitions). However, I had its image and copied the EFI & boot partitions to the same place on the second (the last) drive - including all files, labels, UUIDs, flags, positions on the drive and sizes.

Now my boot process (initramfs) looks like:

md0 is built (degraded though)
crypt container has been decrypted
BUT here the system can’t boot because the “vg0” group it expects has been changed to “vg01”!

There are no conflicts, I don’t have any other LVM devices/groups. So the “vg01” is the only presenting LVM group now.

“lvm vgchange -ay” successfully activates all the volumes, they are accessible from the emergency console then. But I can’t boot even if I rename “vg01” to “vg0” (too late at this stage).

I don’t understand why it’s happening. There should be no difference for LVM because it doesn’t know anything about the RAID state. The boot process has already mounted the crypto container!

An option to the kernel to force LVM group name:

rd.lvm.vg=vg0

doesn’t make any sense.

I’m overwhelmed with this stuff. Once I had experience with Ubuntu that was loading fine from a degraded RAID in the same conditions.
Please help with advice on how I can load my Qubes OS?
Ideally if I can get by without regenrating initramfs.

Max123 · September 18, 2025, 7:37pm

Oooh,

I’ve rewritten the whole drive with the previous dd image of my primary drive and now I see no problems with vg0 naming.

Though after loading, the system disconnects the display completely and only “nomodeset” helps (just to check that the system didn’t fall into a kernel panic). In both cases, pressing the “Power” button triggers the system to shut down.

Is it normal behavior that Qubes OS doesn’t work on degraded RAID-1?
Who is guilty for that - Qubes OS, md-tools, LVM, or something else?

Max123 · September 20, 2025, 11:27am

Solved!

3 issues occurred at the same time forcing me to spend 10+ hours fixing them.

Buggy “nouveau” driver that decided it was time to fully disable both displays. Fixed after disconnecting one of them. For some reason I didn’t see this issue before my NVMe died. So I didn’t see anything on the screen without nomodeset flag after load.
When loading I needed to add this parameter to the kernel:

systemd.unit=rescue.target

then in the emergency console rename the VG and load the system:

lvm vgrename vg01 vg0
lvm vgchange -ay
mount /dev/mapper/vg0-qubes-root /sysroot
exit

But when the system starts, I see the login dialog, but the keyboard is totally disabled. That’s why before fixing (1) I thought that the PC has halted: screen is turned off, keyboard doesn’t react even on NumLock key press.

Investigations led me to the information that “sys-usb” hadn’t started.

Then I booted with the kernel argument:

qubes.skip_auto_start=1

and the keyboard was kept in Dom0.

I inspected the “sys-usb” qube’s settings and realized that it had some strange device in the “Devices” tab in the field “Devices always connected to this qube”:

08:00.3 Unknown device (unknown)

I removed this device and now Qubes OS loads fine.

So the reason was that I just disconnected the USB front panel of my PC case and this minor hardware change led to the malfunction of the whole OS!

This is a critical bug of Qubes OS and I’ll definitely post an issue on GitHub!

My proposal is that “sys-usb” should start even without removed optional USB devices AND it should return the chipset USB controller (with the keyboard) to Dom0 if it is unable to start for any reason. This is a blocker.

Max123 · September 20, 2025, 11:29am

Is there any way to edit the topic title here?
Currently it doesn’t reflect the reason(s) for the issue.

alimirjamali · September 20, 2025, 11:43am

Hi. The software used for the forum (Discourse) has a Trust Level concept. New users with Trust Level 0 (Basic) are not able to edit their own posts after 24 hours. More information on the Discourse trust levels and how higher trust levels are granted here. If you need to edit the topic title, please let moderators know and specify the new title in the reply.

Nice to know

phceac · September 21, 2025, 5:24am

Thanks for the very complete analysis. I can never remember how to get the rescue console!

Did you find any reason for the renaming of the VG?
( I tried and failed one time to set up an installation on a degraded array like you describe. I never imagined a missing disk would have any effect on any device names. Maybe it is an explanation!)

(If you indicate your preferred title, then maybe someone here can change it - even me, maybe)

Max123 · September 21, 2025, 12:17pm

@phceac,

Well, I’m not an expert in LVM, but it looks like it was just named “vg01” in the LVM metadata.

I don’t know why on RAID-1 disks one mirror LVM has the “vg0” name, and the other - “vg01”. It’s weird, this could mean RAID-1 disks don’t contain the same data.

Probably it was renamed during the first load… but it still doesn’t answer why after renaming it back, it doesn’t continue renaming it again.

Ok, the first load from a degraded RAID may be special, and the OS may decline loading at all, indicating that one disk is missing. But why does this affect LVM, which is inside the RAID and read-only on the first load - it’s a puzzle.

I will just write down this case in my personal QubesOS/Linux FAQ to be ready for similar incidents in the future, so I have no time gaps in my work when (not if, because disks die from time to time) it happens again.

Max123 · October 12, 2025, 8:05pm

UPD.

This week I upgraded my PC and realized that Qubes OS doesn’t start again. It wasn’t a surprise for me, but the affected qube was different.

This time I had to remove a (probably) network card device from the sys-net qube.

05:00.0 Unknown device (unknown)

Loading with the qubes.skip_auto_start=1 option passed to the kernel.

The result is the same - Qubes loaded fine after this simple action.
So the issue is not sys-usb-specific but common for any qube that tries to use removed hardware.

I can understand when my own qube doesn’t start (I have a custom storage VM that stopped autostarting since it used the USB ports of the old motherboard as well), but it’s some kind of lack of error handling when system qubes can’t continue without a device that isn’t even required.

Also, another question arises: should I manually passthrough new hardware (let’s say, new network card) via the sys-net qube?..

alimirjamali · October 12, 2025, 8:40pm

There might had been a reordering of PCIe devices. This is a known bug:

github.com/QubesOS/qubes-issues

Remove assigned device that does not exist anymore

opened 03:59PM - 07 Mar 25 UTC

fepitre

C: core P: major needs diagnosis affects-4.3

If an assigned PCI device has its slot changing due to motherboard reordering, t…hen you cannot remove an assigned device because it is not found in the list of PCI devices: ``` Mar 21 16:43:07 dom0 qubesd[2813]: unhandled exception while calling src=b'dom0' meth=b'admin.vm.device.pci.Unassign' dest=b'sys-usb' arg=b'dom0+23_00.3:0000:0000::?******' len(untrusted_payload)=0 Mar 21 16:43:07 dom0 qubesd[2813]: Traceback (most recent call last): Mar 21 16:43:07 dom0 qubesd[2813]: File "/usr/lib/python3.13/site-packages/qubes/api/__init__.py", line 333, in respond Mar 21 16:43:07 dom0 qubesd[2813]: response = await self.mgmt.execute( Mar 21 16:43:07 dom0 qubesd[2813]: ^^^^^^^^^^^^^^^^^^^^^^^^ Mar 21 16:43:07 dom0 qubesd[2813]: untrusted_payload=untrusted_payload Mar 21 16:43:07 dom0 qubesd[2813]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mar 21 16:43:07 dom0 qubesd[2813]: ) Mar 21 16:43:07 dom0 qubesd[2813]: ^ Mar 21 16:43:07 dom0 qubesd[2813]: File "/usr/lib/python3.13/site-packages/qubes/api/admin.py", line 1541, in vm_device_unassign Mar 21 16:43:07 dom0 qubesd[2813]: dev = self.load_device_info(devclass) Mar 21 16:43:07 dom0 qubesd[2813]: File "/usr/lib/python3.13/site-packages/qubes/api/admin.py", line 1521, in load_device_info Mar 21 16:43:07 dom0 qubesd[2813]: if _dev.device_id not in ("*", dev.device_id): Mar 21 16:43:07 dom0 qubesd[2813]: ^^^^^^^^^^^^^ Mar 21 16:43:07 dom0 qubesd[2813]: File "/usr/lib/python3.13/site-packages/qubes/ext/pci.py", line 283, in device_id Mar 21 16:43:07 dom0 qubesd[2813]: vendor_id = self._load_desc()["vendor ID"] Mar 21 16:43:07 dom0 qubesd[2813]: ~~~~~~~~~~~~~~~^^ Mar 21 16:43:07 dom0 qubesd[2813]: File "/usr/lib/python3.13/site-packages/qubes/ext/pci.py", line 316, in _load_desc Mar 21 16:43:07 dom0 qubesd[2813]: self.backend_domain.app.vmm.libvirt_conn.nodeDeviceLookupByName( Mar 21 16:43:07 dom0 qubesd[2813]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ Mar 21 16:43:07 dom0 qubesd[2813]: self.libvirt_name Mar 21 16:43:07 dom0 qubesd[2813]: ^^^^^^^^^^^^^^^^^ Mar 21 16:43:07 dom0 qubesd[2813]: ) Mar 21 16:43:07 dom0 qubesd[2813]: ^ Mar 21 16:43:07 dom0 qubesd[2813]: File "/usr/lib/python3.13/site-packages/qubes/app.py", line 151, in wrapper Mar 21 16:43:07 dom0 qubesd[2813]: return self._wrap_domain(attr(*args, **kwargs)) Mar 21 16:43:07 dom0 qubesd[2813]: ~~~~^^^^^^^^^^^^^^^^^ Mar 21 16:43:07 dom0 qubesd[2813]: File "/usr/lib64/python3.13/site-packages/libvirt.py", line 5221, in nodeDeviceLookupByName Mar 21 16:43:07 dom0 qubesd[2813]: raise libvirtError('virNodeDeviceLookupByName() failed') Mar 21 16:43:07 dom0 qubesd[2813]: libvirt.libvirtError: Node device not found: no node device with matching name 'pci_0000_23_00_3' ```

Glad that you already found the solution.

I believe that answer to be yes. The reason is simple. Having a new PCIe hardware automatically passed to a ServiceVM (based on its class) could be a security hazard. And in many cases, the user might want individual ServiceVM for the new hardware for better control.