Qubes no longer boots after modifying xen.cfg

last year I tried upgrading dom0 kernel from 4.* to 5.* and Qubes became unstable after that (randomly hang after a few minutes). I tried bios tweaks and such back then but to no avail, so I reverted to the older kernel and all was well.

Yesterday I ran the auto updater tool and forgot to uncheck dom0, so I accidentally updated the dom0 kernel, and the system was again unstable after rebooting.

So I wanted to revert to the old dom0 kernel to for stability. Unfortunately, my first action to was to edit the xen.cfg file and change [default] to the older 4.* kernel listed in that file.

The GUI hanged before I was able to actually reinstall that older kernel in the system, and upon hard reset, qubes booted into a red-text emergency shell. Hoping for a miracle, I rebooted again, and this time not even the emergency shell came up and the EFI boot no longer found an OS.


What I’ve tried so far:
I used the 4.03 USB install disk to boot via UEFI, and then mounted the EFI partition. xen.cfg was empty, so manually entered the parameters shown in the qubes documentation UEFI Troubleshooting example, and set the kernel values to the newly auto-installed version.

That didn’t change anything (still no boot device found) so I backed up the xen.efi into a different name, and copied the latest available xen-* in the EFI/qubes directory to xen.efi, and the OS still did not boot.

Next I tried the efibootmgr trick as listed on the UEFI Troubleshooting page, changing the drive from “dev/sda -p 1” to “/dev/nvme0n1 -p 1” and the new entry shows up in efibootmgr -v as active, but the BIOS does not show it as one of the available EFI boot options. I changed the boot order in efibootmgr, putting the new one first but the BIOS still doesn’t see it.

I tried listing the drive as “/dev/nvme0n1p1” using efibootmgr without the “-p 1” and it created a boot entry that did show up in the BIOS, but it was incorrect and said MBR with a bunch of zeros instead of GPT with a nice string of numbers like the previous working xen efi boot entry.

Then I tried going into my Dell’s BIOS to use its boot manager to create an EFI boot entry by pointing it directly to /EFI/qubes/xen.efi. An entry is created, but does not find a boot device either. (and appears different from the efibootmgr created entries).

Any help here would be greatly appreciated.

Is there some location that might contain a backup of my previous xen.cfg which I edited from within the OS to cause this mess? Or should the generic values from EFI Troubleshooting page simply work after substituting the example with my currently installed dom0 kernel? (I noticed another person’s xen.cfg has some LUKS info/numbers specific to their system)

I saw that another potential solution on on the UEFI Troubleshooting page is to create an /EFI/BOOT/ folder with the xen.efi and xen.cfg renamed to BOOTX64.efi and BOOTX64.cfg. Would this help when my system was already booting previously with /EFI/qubes/xen.efi anyway?

Thanks in advance!

The saner option is to boot on rescue mode, backup your qubes and reinstall followed by a restore. It might have another way but don’t try to break your head with it, sorry if this is not the answer you were looking for.

I haven’t encountered or solved this problem myself, but have you tried to decrypt the drive while mounted onto a working OS and editing the xen.cfg from there, using a ‘clean’ version (probably available on Github) as reference? There should be tools out there allowing you to mount and decrypt LUKS partitions.

If my memory serves me right, isn’t the path /boot/efi/EFI/qubes/xen.efi on R4.0.4? (Doesn’t apply to R4.1)

 


Not technically-trained; consume with salt.

Thanks for responding. I’m not sure what you mean by rescue mode. Do you mean “anaconda --rescue” from the R4.03 install media? I haven’t gotten it to work yet. Does the shell include a backup feature? That would be really helpful, as I could simply install a fresh 4.1 on a new SSD and restore all of my backed-up qubes from 4.03.

Hi fiftyfourthparallel,

Appreciate your response. I haven’t tried to decrypt the main Qubes partition yet, as I am saving that for a last resort. But if we’re talking about the xen.cfg file, isn’t it stored in the unencrypted EFI partition, anyway? That must have been the one I edited while in the OS…unless it resides in two places at once.

It’s possible that when I tried unsuccessfully to boot and got the Emergency shell, the xen.cfg was wiped, because it was empty when i mounted the EFI partition to start figuring things out.

I have three parttitions on the SSD:
nvme0n1p1 500MB unencrypted EFI partition
nvme0n1p2 1GB unencrypted “boot” partition (I think)
nvme0n1p3 LUKS encrypted qubes OS

I think partition 2 usually gets mounted as “boot” and then partition 1 gets mounted within that as “efi,” which is why you usually see /boot/efi/EFI/qubes.

I’m mounting manually instead of using the aforementioned “anaconda --rescue” which does not seem to work for me (says no linux partitions found). So i simply did a “mount /dev/nvme0n1p1 /mnt” which gave me the EFI/qubes directory (which is also how the Dell BIOS lists it). I don’t think i need anything from partition 2 at this point, but I must admit I am a bit out of my depth here. I believe that’s the “boot” partition, and it has a bunch of “config-kernel versions” files, plus for the latest xen it has a ‘xen-version.config’ file, too. So far I’ve only played around with the the 500MB EFI partition, not the 1GB boot partion. I will try mounting both, to create the “/boot/efi/EFI/qubes/xen.efi” path you mentioned and re run the efibootmgr -c to see if it creates one that magically works :slight_smile:

When I bought a laptop with an Intel 10th Gen CPU and installed R4.0.3, there were multiple issues with the default (starter) dom0 kernel–the touchpad wasn’t responsive and the display was basically a slideshow. I learned to fix it by reading the docs, which gave me the idea to install the latest dom0 kernel and adjust kernel options relevant to iGPUs.

The options were stored in /boot/efi/EFI/qubes/xen.efi of dom0’s filesystem, and it is also where you go to change which kernel version is in use. This is almost definitely in your LUKS partition, which is why I recommended you mount your LUKS partition, decrypt it, and edit xen.efi there.

This is my recommendation because you said it was a dom0 kernel update that directly caused the problem–so the solution is to switch the dom0 kernel back to what it was, and that is something I have experience doing.

However, all the rest of what you did are beyond me–it is possible that all the fiddling with the EFI partition and boot procedures and xen.cfg might have made things worse and following what I wrote might still leave you stuck. I certainly think it’s possible, since they don’t seem to be relevant to switching the default dom0 kernel.

Since you might be mounting and decrypting the LUKS partition, you might as well consider backing up your data (if you haven’t already) before proceeding further.

1 Like

Came back and realized I mistyped xen.cfg as xen.efi in the previous message. My apologies for any confusion.

Thanks for all of your input and support here, it is very helpful. I knew you meant xen.cfg but appreciate the correction.

I’ve ordered a USB nvme enclosure and a second ssd to make a dd clone of the drive that I can swap back in if necessary.

So if the real /boot/efi/EFI/xen.cfg is still there on the LUKS partition then my fix might be as simple as changing that one line of text in xen.cfg to match the last installed dom0 kernel. It will be unstable again but as you say there are many things to try, once I get there.

Update:
I successfully cloned the disk to an external drive and was able to decrypt the LUKS partition using cryptsetup luksOpen -v /dev/sdb qubes_dom0

The fact that my R.4.03 installation USB rescue says “no linux partitions found” after password entry was scary, but it looks like everything is still there in /dev/mapper/ after decrypting. Maybe just a bug with the R.4.03 version of the Rescue program.

Update 2: It boots!!
It turns out my original idea was correct. I did not need to even mount the luks encrypted partition to fix it. Everything I needed was in my 500MB unencrypted EFI partition on the drive, including the last-used initramfs img, vmlinuz, xen.efi. I had written the xen.cfg after viewing other people’s examples online, substituting my own kernel version(s) and adding a rd.luks.uuid=xxxxxxx entry to it.

I took a video of the bootup and saw what I did wrong-- the xen loader did not find my initramfs img because I mystyped a letter. That must have been my problem all along during my many attempts at getting it to boot.

So far, the system has been stable even on this newer kernel. Maybe one of those tweaks in the xen.cfg helped.

2 Likes

Glad it’s fixed. I didn’t expect human error, and since you made a reference to your GUI it meant that boot worked and the update didn’t corrupt the entire boot procedure, so I expected my solution to help.

From this thread I learned that you can also make changes to xen.cfg via the EFI partition. Does it automatically get copied into dom0’s xen.cfg? Can you check for me?

Yeah my original dom0 kernel update completed and rebooted successfully. There was no corruption; it was just very unstable. Instead of trying all of the recommended tricks to stabilize it, I decided to revert to the older kernel but forgot to actually revert it (I changed the xen.cfg to math the older kernel but didn’t change the system kernel).

That was the first thing I checked, because I was curious too. After booting fully into qubes GUI, I opened dom0 terminal and did a sudo nano boot/efi/EFI/qubes/xen.cfg and saw my handwritten entries there, so it is the same file. Think that’s why there is the extra capital letters /“EFI” folder after “boot/efi,” because that is simply the mountpount of the unencrypted EFI partition.

2 Likes

I haven’t had a single hangup yet and changed nothing but my manually typed xen.cfg, so at this point I am sure that the extra things in xen.cfg completely stabilized my system on the newer kernel.

I dont have a snapshot of what my cen.cfg was before, so I dont know which exact command did the trick, but
On my “options” line I added smt=off ucode=scan
And at the end of the kernel line, I have i915.preliminary_hw_support=1 rhgb quiet i915.enable_psr=0

1 Like