RAID failure, but only on Dom0 boot

Hi all,

I came across a strange issue: on boot and in my “journalctl” logs I see my RAIDs failing initialization.

I have 2 SATA RAIDs (data only, mirrored) and 1 NVMe SATA RAID (system, with QubesOS, mirrored).

The issue occurs with only the two SATA RAIDs that don’t take part in OS loading, actually.

The most surprising thing is that later on in my “vm-storage” to which I connect SATA controllers directly I have no such problems and can assemble arrays and use partitions in this VM or in any other VM after attaching.

This is not a problem of parallel access of Dom0 and the AppVM, “vm-storage” starts much later.

Actually, I don’t need Dom0 to use these devices at all, but since it uses them, I would like to be guaranteed that it won’t break data on them.

Any idea why Dom0 has such issues with these devices?

Dec 25 12:12:54 dom0 kernel: md/raid1:md127: Disk failure on sda, disabling device.
                             md/raid1:md127: Operation continuing on 1 devices.
Dec 25 12:12:54 dom0 udisksd[6537]: Unable to resolve /sys/devices/virtual/block/md127/md/dev-sda/block symlink
Dec 25 12:12:54 dom0 udisksd[6537]: Unable to resolve /sys/devices/virtual/block/md127/md/dev-sda/block symlink
Dec 25 12:12:54 dom0 kernel: sd 6:0:0:0: [sda] Synchronizing SCSI cache
Dec 25 12:12:54 dom0 kernel: sd 6:0:0:0: [sda] Stopping disk
Dec 25 12:12:55 dom0 kernel: mdadm: attempt to access beyond end of device
                             sdb: rw=432129, sector=16, nr_sectors = 8 limit=0
Dec 25 12:12:55 dom0 kernel: md: super_written gets error=-5
Dec 25 12:12:55 dom0 kernel: sd 7:0:0:0: [sdb] Synchronizing SCSI cache
Dec 25 12:12:55 dom0 kernel: md127: detected capacity change from 31251494912 to 0
Dec 25 12:12:55 dom0 kernel: md: md127 stopped.
Dec 25 12:12:55 dom0 kernel: sd 7:0:0:0: [sdb] Stopping disk
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md127/md/degraded': Failed to open file “/sys/devices/virtual/block/md127/md/degraded”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md127/md/sync_action': Failed to open file “/sys/devices/virtual/block/md127/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md127/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md127/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md127/md/bitmap/location': Failed to open file “/sys/devices/virtual/block/md127/md/bitmap/location”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:55 dom0 mdadm[6490]: DeviceDisappeared event detected on md device /dev/md127
Dec 25 12:12:55 dom0 udisksd[6537]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdc3/block symlink
Dec 25 12:12:55 dom0 kernel: md/raid1:md126: Disk failure on sdc3, disabling device.
                             md/raid1:md126: Operation continuing on 1 devices.
Dec 25 12:12:55 dom0 udisksd[6537]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdc3/block symlink
Dec 25 12:12:55 dom0 kernel: sd 8:0:0:0: [sdc] Synchronizing SCSI cache
Dec 25 12:12:55 dom0 kernel: sd 8:0:0:0: [sdc] Stopping disk
Dec 25 12:12:55 dom0 udisksd[6537]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdd3/block symlink
Dec 25 12:12:55 dom0 kernel: md: super_written gets error=-5
Dec 25 12:12:55 dom0 udisksd[6537]: Unable to resolve /sys/devices/virtual/block/md126/md/dev-sdd3/block symlink
Dec 25 12:12:55 dom0 kernel: sd 9:0:0:0: [sdd] Synchronizing SCSI cache
Dec 25 12:12:55 dom0 kernel: md126: detected capacity change from 1950109696 to 0
Dec 25 12:12:55 dom0 kernel: md: md126 stopped.
Dec 25 12:12:55 dom0 kernel: sd 9:0:0:0: [sdd] Stopping disk
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md126/md/degraded': Failed to open file “/sys/devices/virtual/block/md126/md/degraded”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md126/md/sync_action': Failed to open file “/sys/devices/virtual/block/md126/md/sync_action”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md126/md/sync_completed': Failed to open file “/sys/devices/virtual/block/md126/md/sync_completed”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:55 dom0 udisksd[6537]: Error reading sysfs attr `/sys/devices/virtual/block/md126/md/bitmap/location': Failed to open file “/sys/devices/virtual/block/md126/md/bitmap/location”: No such file or directory (g-file-error-quark, 4)
Dec 25 12:12:56 dom0 kernel: pciback 0000:14:00.0: xen_pciback: seizing device
Dec 25 12:12:56 dom0 kernel: xen: registering gsi 47 triggering 0 polarity 1
Dec 25 12:12:56 dom0 kernel: Already setup the GSI :47
Dec 25 12:12:56 dom0 kernel: pciback 0000:05:00.0: xen_pciback: seizing device
Dec 25 12:12:56 dom0 kernel: xen: registering gsi 46 triggering 0 polarity 1
Dec 25 12:12:56 dom0 kernel: Already setup the GSI :46
Dec 25 12:12:56 dom0 mdadm[6490]: DeviceDisappeared event detected on md device /dev/md126
Dec 25 12:12:57 dom0 kernel: loop0: detected capacity change from 0 to 1400904
Dec 25 12:12:57 dom0 kernel: loop1: detected capacity change from 0 to 1400904

This is mostly likely the explanation. Your vm-storage takes the SATA devices out of dom0 while dom0 is trying to access the RAID array. I would recommend hiding this SATA controller from dom0 by adding kernel parameter: rd.qubes.hide_pci=0000:05:00.0 (adjust value to match your SATA controller BDF, I’m just guessing based on the logs).