[qubes-users] 4.1.0 installation failure

Hi!

So after my failed USB stick, my power supply had failed, too 8-(
When having got a new stick and a new power supply, I tried to install Qubes OS 4.1.0 on an external disk that had some old Qubes OS 4.0 on it.
I chose a custom setup, creating partitions LUKS, PC, VG, and LVs as instructed. The assigning the partitions and LVs.

Installation went smoothly (mostly because of a real fast USB stick), I was asked to reboot.

Unfortunately (I think it's an old bug) boot failed as dracut wanted to open a LUKS ID that wasn't found.
My guess was that the installer had cached the old LUKS ID that was on the disk before I recreated the structure.
On the next attempt, I edited the GRUB command line to have the correct LUKS UUID. THings looked better, but after unlocking the LUKS successfully, nothing else seemed to happen. So I aborted it.

Examining the journal of the failed boots, I found this:
Feb 24 23:52:19 dom0 lvm[1326]: Device open /dev/sdd1 8:49 failed errno 2
Feb 24 23:52:20 dom0 kernel: md124: p1
Feb 24 23:52:20 dom0 lvm[1326]: WARNING: Scan ignoring device 8:1 with no paths.
Feb 24 23:52:20 dom0 lvm[1326]: WARNING: Scan ignoring device 8:17 with no paths.
Feb 24 23:52:20 dom0 lvm[1326]: WARNING: Scan ignoring device 8:33 with no paths.
Feb 24 23:52:20 dom0 lvm[1326]: WARNING: Scan ignoring device 8:49 with no paths.
Feb 24 23:52:20 dom0 dmeventd[3705]: dmeventd ready for processing.
Feb 24 23:52:20 dom0 kernel: lvm[1326]: segfault at 801 ip 0000777003fcfdde sp 00007ffd4db1c028 error 4 in libc-2.31.so[777003e91000+150000]
Feb 24 23:52:20 dom0 kernel: Code: fd d7 c9 0f bc d1 c5 fe 7f 27 c5 fe 7f 6f 20 c5 fe 7f 77 40 c5 fe 7f 7f 60 49 83 c0 1f 49 29 d0 48 8d 7c 17 61 e9 c2 04 00 00 <c5> fe 6f 1e c5 fe 6f 56 20 c5 fd 74 cb c5 fd d7 d1 49 83 f8 21>
Feb 24 23:52:20 dom0 kernel: audit: type=1701 audit(1645743140.034:101): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=1326 comm="lvm" exe="/usr/sbin/lvm" sig=11 res=1
Feb 24 23:52:20 dom0 audit[1326]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=1326 comm="lvm" exe="/usr/sbin/lvm" sig=11 res=1
Feb 24 23:52:20 dom0 lvm[3705]: Monitoring thin pool qubes_dom0-pool00-tpool.
Feb 24 23:52:20 dom0 lvm[2561]: 3 logical volume(s) in volume group "qubes_dom0" now active
Feb 24 23:52:20 dom0 systemd[1]: Finished LVM event activation on device 253:0.

That segfault doesn't look good!

The last things that seem to happen on boot are:
Feb 24 23:52:22 dom0 systemd[1]: Finished udev Wait for Complete Device Initialization.
Feb 24 23:52:22 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 24 23:52:22 dom0 kernel: audit: type=1130 audit(1645743142.001:103): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=>
Feb 24 23:52:22 dom0 systemd[1]: Starting Activation of DM RAID sets...
Feb 24 23:52:22 dom0 systemd[1]: dmraid-activation.service: Succeeded.
Feb 24 23:52:22 dom0 systemd[1]: Finished Activation of DM RAID sets.
Feb 24 23:52:22 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 24 23:52:22 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 24 23:52:22 dom0 kernel: audit: type=1130 audit(1645743142.797:104): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=su>
Feb 24 23:52:22 dom0 kernel: audit: type=1131 audit(1645743142.797:105): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=su>

(Those MD-RAIDS are my built-in disks (two RAID1))
Eventually I tried a reboot then:

eb 25 00:01:26 dom0 systemd[1]: Received SIGINT.
Feb 25 00:01:26 dom0 systemd[1]: Removed slice system-getty.slice.
Feb 25 00:01:26 dom0 systemd[1]: Removed slice system-modprobe.slice.
Feb 25 00:01:26 dom0 systemd[1]: Stopped target Block Device Preparation for /dev/mapper/luks-a10e21f9-2581-47f7-819a-ec06fde599a1.
Feb 25 00:01:26 dom0 systemd[1]: Stopped target Remote Encrypted Volumes.
Feb 25 00:01:26 dom0 systemd[1]: mdmon@md125.service: Succeeded.
...
Feb 25 00:01:26 dom0 systemd[1]: Removed slice system-lvm2\x2dpvscan.slice.
Feb 25 00:01:26 dom0 systemd[1]: tmp.mount: Succeeded.
Feb 25 00:01:26 dom0 systemd[1]: Unmounted Temporary Directory (/tmp).
Feb 25 00:01:26 dom0 systemd[1]: Stopped target Swap.
Feb 25 00:01:26 dom0 systemd[1]: Deactivating swap /dev/disk/by-id/dm-name-qubes_dom0-swap...
Feb 25 00:01:27 dom0 systemd-cryptsetup[5225]: Device luks-a10e21f9-2581-47f7-819a-ec06fde599a1 is still in use.
Feb 25 00:01:27 dom0 systemd-cryptsetup[5225]: Failed to deactivate: Device or resource busy
Feb 25 00:01:27 dom0 systemd[1]: systemd-cryptsetup@luks\x2da10e21f9\x2d2581\x2d47f7\x2d819a\x2dec06fde599a1.service: Control process exited, code=exited, status=1/FAILURE
...
However it did not complete.

On next boot even the segfault repeated:
eb 25 00:04:08 dom0 lvm[1319]: Device open /dev/sdd1 8:49 failed errno 2
Feb 25 00:04:08 dom0 lvm[1319]: Device open /dev/sdd1 8:49 failed errno 2
Feb 25 00:04:08 dom0 lvm[1319]: WARNING: Scan ignoring device 8:1 with no paths.
Feb 25 00:04:08 dom0 lvm[1319]: WARNING: Scan ignoring device 8:33 with no paths.
Feb 25 00:04:08 dom0 lvm[1319]: WARNING: Scan ignoring device 8:49 with no paths.
Feb 25 00:04:08 dom0 audit[1319]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=1319 comm="lvm" exe="/usr/sbin/lvm" sig=11 res=1
Feb 25 00:04:08 dom0 kernel: lvm[1319]: segfault at 801 ip 0000764be7365dde sp 00007ffebb3bacb8 error 4 in libc-2.31.so[764be7227000+150000]
Feb 25 00:04:08 dom0 kernel: Code: fd d7 c9 0f bc d1 c5 fe 7f 27 c5 fe 7f 6f 20 c5 fe 7f 77 40 c5 fe 7f 7f 60 49 83 c0 1f 49 29 d0 48 8d 7c 17 61 e9 c2 04 00 00 <c5> fe 6f 1e c5 fe 6f 56 20 c5 fd 74 cb c5 fd d7 d1 49 83 f8 21>
Feb 25 00:04:08 dom0 kernel: audit: type=1701 audit(1645743848.432:102): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=1319 comm="lvm" exe="/usr/sbin/lvm" sig=11 res=1
...

And the boot action ended here:
Feb 25 00:04:09 dom0 systemd[1]: Finished udev Wait for Complete Device Initialization.
Feb 25 00:04:09 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 25 00:04:09 dom0 kernel: audit: type=1130 audit(1645743849.265:103): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=>
Feb 25 00:04:09 dom0 systemd[1]: Starting Activation of DM RAID sets...
Feb 25 00:04:09 dom0 systemd[1]: dmraid-activation.service: Succeeded.
Feb 25 00:04:09 dom0 systemd[1]: Finished Activation of DM RAID sets.
Feb 25 00:04:09 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 25 00:04:09 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb 25 00:04:09 dom0 kernel: audit: type=1130 audit(1645743849.964:104): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=su>
Feb 25 00:04:09 dom0 kernel: audit: type=1131 audit(1645743849.964:105): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=su>

The other thing I had noticed was that my NIC wasn't activated during installation; is that a problem?
From ethtool:
driver: r8169
version:
firmware-version: rtl8168g-2_0.0.1 02/06/13

In openSUSE LEap 15.3 it works:
Feb 25 01:00:03 i7.site kernel: r8169 0000:03:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
Feb 25 01:00:03 i7.site kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Regards,
Ulrich

Issue filed as First boot after installing never finishes ("A start job is running for Monitoring of LVM2 mirrors, ...") · Issue #7335 · QubesOS/qubes-issues · GitHub