Ext4-fs Error on Qubes 4.1.1 upgrade (fresh installation) over 4.1.0 in Multiboot

dracoren · August 6, 2022, 5:28pm

I have the following stats.
Laptop Acer Nitro 5 AMD Ryzen 9 5900HX
Two Nvme SSDs:
a. 1TB has a Dual boot: Windows + Ubuntu
b. 256GB SSD with Qubes OS (Version, ref below)

This configuration had worked so far with Qubes 4.1.0, but there was problem updating the kernel to 5.16 from 5.10, hence neither the wifi nor the audio/BT was working.

Today I upgraded to Qubes 4.1.1 on the 256GB Nvme SSD. The first boot immediately after installation works normal, and I even restored some of the custom VMs that I had backed up from 4.1.0. However, the first Nvme totally disappears from the usb devices list during this boot. Also, after restarting, all the boot entries (Windows+Ubuntu) disappear from the Bios menu and only Qubes OS entry is seen. Booting into the only entry, QubesOS works fine here.

At this stage, when the power cable is disconnected and then the laptop is rebooted, Windows starts up, and checks disk for error. Escaping the error check, and rebooting results in reappearance of all the previous entries in Bios Boot Menu (Windows, Ubuntu, QubesOS), but booting into the Qubes OS this time results in a Ext4-fs system error as in images shared below. I have tried fsck, e2fsck and even error check from gparted on the partition, but there are no errors.

I have installed twice, and both times the same phenomenon occurs, ditto.

I think this could be related to the new feature in release notes for 4.1.1 " * UEFI boot now loads GRUB, which in turn loads Xen, making the boot path similar to legacy boot and allowing the user to modify boot parameters or choose an alternate boot menu entry".

Can Anyone please help?

I also tried implementing the solution at:

(“ext4-fs-error-after-ubuntu-17-04-upgrade” at askubuntu although it’s a solution for a different but similar error)
by entering the following commands at Qubes boot menu Option C: Grub commandline:
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash nvme_core.default_ps_max_latency_us=VALUE”
where I varied VALUE to 0, 5500, 200. This started changing the details in the Ext4-fs error messages.

Rescue Qubes OS was tried from the installer USB, but it is unfamiliar to me and I don’t want to ruin this install again just in case the solution is simple.
I am guessing that some code related to nvme need to be updated in the grub.

Thanks for all the help.

HPOA909 · August 6, 2022, 10:58pm

Did you preserve the EXT4 during the disk wipeout session in the freshly installation process?

dracoren · August 7, 2022, 4:33am

Thanks.
If you are asking about the option during the installation process, I reclaimed all space, by deleting all existing used space on the disk.
Uploading image just to show configuration to anybody else who checks out this problem. (This screenshot was taken after the install, just to verify the options)

HPOA909 · August 7, 2022, 4:48am

dracoren · August 7, 2022, 6:21pm

Tried fresh reinstall with the power cable disconnected. No change in procedure.
Rebooted several times into Qubes and Windows, and after replugging power supply. The problem just disappeared. I am able to boot into all the OS entries without any errors.

I wonder if somebody can figure out and explain what actually happened here.
I will repost if the problem reappears.
Now up for solving fresh problems!!

dracoren · August 9, 2022, 7:21am

Specs: Acer Nitro5 Ryzen 9 5900HX 32GB DDR4 RAM, More than 80GB free space in the HDDs after complete Qubes OS 4.1.1 installation and updating custom VMs backed up from 4.1.0
After 4 different installs in 3 different mediums (internal Nvme 256GB, External HDD 240GB, USB Stick 128GB), upgrading from 4.1.0 to 4.1.1, the following issues were noted.

Error with Luks2 password entry (no password prompt at all)
Ext4 errors
Dracut-initqueue timeout error, missing dom0 root/ swap/ crypto Luks not found
The problems appeared almost always after simultaneous multiple processes on different qubes, especially updates (especially whonix-gw and debian 11 in my case). The system freezes, and after unusual shutdown (like forced powerdown/logout during running but frozen processes) and reboot the above errors appear.

*This is most probably related to LUKS2, as every time the errors and associated displayed values on screen change, but the underlying problem is the same - something related to Luks password entry. When the disk with Qubes4.1.1 is attached to Linux distros (Ubuntu, Ubuntu live), although it shows only 3 partitions (EFI, EXT4 and the large Encrypted), a set of 22 loop devices ranging from about 20KB to 200MB each get mounted and are visible on desktop.

Qubes 4.1.0 had problem with Wifi/Bluetooth/Audio connection, and there was problem upgrading the kernel from v5.10.x.x to 5.16.x.x no matter what the updates showed, but data integrity had no issues.
As of now, I would tread carefully with data on Qubes 4.1.1 and using multiple processes during updates/ stress testing this OS with multiple VMs.

The process I followed to get Qubes OS running on external HDDs has been posted separately in my next reply for easy reference.

dracoren · August 9, 2022, 7:39am

INSTALLING QUBES OS ON EXTERNAL HDD (NVME, SSD OR USB STICK) - USB BOOTABLE QUBES OS - ?PORTABLE QUBES OS

Install Qubes on USB External HDD (SSD/Nvme preferable) as usual.
After complete installation, rebooting refuses to identify the USB device.
The installed QubesOS on gparted shows 3 partitions,

boot efi, 2. the boot files vmlinuz, intridram etc, 3. encrypted data partition

(FROM “UEFI troubleshooting” ON QUBES DOCUMENTATION)
“Boot device not recognized after installing
Some firmware will not recognize the default Qubes EFI configuration. As such, it will have to be manually edited to be bootable. This will need to be done after every kernel and Xen update to ensure you use the most recently installed versions.”

Follow the steps to make the USB bootable QubesOS entry in the UEFI menu:

Boot into Linux distro and find out the device ID of your QubesOS installation
sudo fdisk -l
(eg.1
/dev/sdb1 for efi
/dev/sdb2 for boot files on Ext4
/dev/sdb3 for encrypted data partition
eg.2
/dev/nvme1n1p1 for efi
/dev/nvme1n1p2 for boot files on Ext4,
/dev/nvme1n1p3 for encrypted data partition)
Mount the efi partitions
sudo mkdir /mnt/TEMP
sudo mount /dev/sdb1 /mnt/TEMP
Copy the /boot/efi/EFI/qubes/ directory to /boot/efi/EFI/BOOT/ (the contents of /boot/efi/EFI/BOOT should be identical to /boot/efi/EFI/qubes besides what is described in steps 4 and 5):
cp -r /mnt/TEMP/EFI/qubes/. /mnt/TEMP/EFI/BOOT
Rename /boot/efi/EFI/BOOT/xen.cfg to /boot/efi/EFI/BOOT/BOOTX64.cfg:
mv /mnt/TEMP/EFI/BOOT/xen.cfg /mnt/TEMP/EFI/BOOT/BOOTX64.cfg
Copy /boot/efi/EFI/qubes/xen-*.efi to /boot/efi/EFI/qubes/xen.efi and /boot/efi/EFI/BOOT/BOOTX64.efi. For example, with Xen 4.8.3 (you may need to confirm file overwrite):
cp /mnt/TEMP/EFI/qubes/xen-4.8.3.efi /mnt/TEMP/EFI/qubes/xen.efi
cp /mnt/TEMP/EFI/qubes/xen-4.8.3.efi /mnt/TEMP/EFI/BOOT/BOOTX64.efi

Since the xen.cfg was missing during step 4, the web was consulted for its contents. The same webpage (UEFI Trouble shooting in Qubes documentation online) returned with the content.
After fishing for the kernel version from the sdb2/nvme1n1p2 partition, the xen.cfg was completed with the following entries (replace the version number accordingly):

[global]
default=5.16.90-100.fc32.qubes.x86_64

[5.16.90-100.fc32.qubes.x86_64]
options=loglvl=all dom0_mem=min:1024M dom0_mem=max:4096M
kernel=vmlinuz-5.16.90-100.fc32.qubes.x86_64 root=/dev/mapper/qubes_dom0-root rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap i915.preliminary_hw_support=1 rhgb quiet
ramdisk=initramfs-5.16.90-100.fc32.qubes.x86_64.img

Then copied the file xen.cfg from BOOT to qubes also, just in case.

Shutdown, remove the USB before booting, check usual Bootmenu entries in BIOS, if garbled reboot until they appear. Then connect USB with Qubes OS. The entry will appear as “Linpus Lite” in the bootmenu.
Eg.,

Linpus Lite
Windows Boot Manager
Ubuntu Internal
Qubes OS Internal

dracoren · August 11, 2022, 10:08am

I might have figured out the cause since after resolving these QOS4.1.1 is working fine, except for the Wifi-5G which has stopped working. Listing them if somebody else is getting the same errors. These might be some of the possible reasons.

Timezone issue between several OS (most likely)
Solution that helped for now:
Set RTC in BIOS to local time
Set timezone to UTC on both Linux and QubesOS in multiboot using:
#timedatectl
Ensure that time is intact on Ubuntu Dual/Multiboot reboot.
(incidentally this also solved the apt update errors on ubuntu that I used to get)
Luks2 encryption issue (alone unlikely, will post if it repeats)
Whonix/Debian Updates botched up something (alone unlikely, will post if it repeats)
Combination (Most Likely): Luks2-timezone mismatch-?simultaneous updates

HPOA909 · August 11, 2022, 11:44pm

I suggest to shrink the EXT4 to the minimum and still preserving it.

dracoren · August 16, 2022, 2:49pm

One more finding.
If in this multiboot scenario (Nvme1 with Win+Ubuntu & Nvme2 with QOS411) alternate boot into windows and Ubuntu or QOS is constantly botching up time.
And if after booting into Win/Ubuntu with some external HDDs attached, the system is rebooted to QOS411, the LUKs gui fails to load and QOS gets stuck at a black screen. Starting the system again after detaching any external USB/HDD attached helped boot the QOS as usual.