Grub2-mkconfig makes dom0 fail to boot

I tried to change my Xen and dom0 kernel arguments by editing /etc/default/grub as root in dom0, and then running, also as root, grub2-mkconfig -o /boot/grub2/grub.cfg. Unfortunately now my system is unbootable - even if I put all the command line arguments back to how they were!

I was unable to get a dracut shell to diagnose the problem because the boot just gets stuck with “A start job is running for Qubes Dom0 startup setup (7h 7min 57s / no limit)”

I suspect that grub2-mkconfig is, by default, creating broken initial ramdisks which don’t have the right modules included to mount the root filesystem.

You can boot from Qubes Installation ISO in Recovery mode and mount and chroot to your system there to change /etc/default/grub to what it was before and run grub2-mkconfig -o /boot/grub2/grub.cfg from chroot.

The following command changed the initrd:

/bin/kernel-install -v add 5.16.13-2.fc32.qubes.x86_64 /boot/vmlinuz-5.16.13-2.fc32.qubes.x86_64

and now I have a boot process which fails in a simpler way - it simply prints

device-mapper: table: 253:0: crypt: unknown target type
device-mapper: ioctl: error adding target to table
[FAILED] Failed to start Cryptography Setup for luks-blah-blah

and then after a delay, eventually drops me to a Dracut Emergency Shell, which was what I wanted.

As I said above, removing the new Xen and kernel arguments does not fix the problem.

It seems like the dm-crypt kernel module is missing in initramfs. You can try to regenerate initramfs with dracut from chroot (don’t forget to mount /boot partition):
dracut --kver your-kernel-version
Also you can extract files from initramfs and check if the dm-crypt kernel module is present.

I finally figured it out. I had accidentally removed the symlink at /sbin in dom0 by running tar xpvf on a .tar file I had created, containing some software I had compiled for dom0 (unfortunately, tar does not make it obvious that it has overwritten the symlink). This is not the first time I have broken a system by running tar at /, either! The command not found errors that I had seen should have been a clue to me (although I still do get one harmless command not found error after fixing this).

The fix was to run:

cd /
mv sbin/* usr/sbin
rmdir sbin
ln -s usr/sbin

and then run the kernel-install command in my previous post again.