Hang at end of loading, then reboot, on Framework 16

I was able to install Qubes 4.2.1 without major issues (the trackpad doesn’t work as documented and Ventoy is apparently incompatible with Qubes, but that’s it) but post-install I’ve had a bunch of issues, I think to do with a graphics race condition. They appear to be resolved after the initial setup, but it could be the same root cause so I’m mentioning it.

Currently, I’m hanging at the end of the initial loading screen. I’m able to enter my LUKS password and all that seems to be working, but once the progress bar at the bottom of the screen fills up, everything hangs; after maybe 60 seconds, the laptop just reboots.

I was able to boot Qubes successfully during install and (after a bunch of trouble with black screens and boot loops) for the first-time setup. Then I was able to boot into Qubes successfully exactly once. I added the kernel parameters described in the HCL, and now I’m getting the loop described above, even when I remove the parameters from the grub boot menu.

Removing quiet, adding console=hvc0 earlyprintk=xen for Linux and adding loglvl=all guest_loglvl=all vga=,keep to Xen gives me more logs before the LUKS/loading screen, but after that I end up with the grey loading screen, hang, reboot, no logs. Ctrl+Alt+F1-F7 don’t bring up alternate TTYs. I’ve seen mention of pressing F2 at the LUKS prompt to get a TTY version but that just toggles how the password renders on my machine.

I’ve just tried reinstalling with the “latest kernel” option on my install media (“Install Qubes OS R4.2.1 using kernel-latest (6.7.7-1.qubes.fc37)”). Similar outcome, but instead of the graphical LUKS password prompt it’s a text one.

After trying another reinstall with a fresh download (though the version looked the same) and a different USB stick, it’s back to the same graphical issues as before – i.e. while it’s waiting for the LUKS password to be entered, the screen abruptly goes black. It seems like I can blindly enter my password and continue onto a black screen with a mouse cursor, then the machine reboots.

(Also obligatory first post apology if I’m putting this in the wrong category, missing tags, etc.)

Remove quiet and add plymouth.enable=0 to the kernel command line options in GRUB, then check the messages that you see when it boots and hangs. Are there any errors or failures? What are the last messages?
You can try to boot with qubes.skip_autostart so your qubes won’t automatically startup at boot in case it’s an issue with your qubes.
Do you have external display connected to the laptop?

Unfortunately I entirely forgot about this thread after seeing your reply, my bad (I installed a distro I’m more familiar with to get the laptop working quickly, meaning to come back to Qubes in a bit, and… that’s now, I guess, oops.)

From memory at the time:

  • I tried with and without an external display; I didn’t see any extra logs or different behavior.
  • There weren’t any seemingly significant error messages. I saw (and still see) things like e.g. pciback 0000:01:00.0: not ready [...]ms after FLR: waiting (and then : giving up after it waits 60s), but they scroll too fast to read and my phone camera is hilariously bad.

In the time since I did a BIOS update, which might confuse the issue. Now when booting without quiet and with plymouth.enable=0 qubes.skip_autostart, it dumps me into the first-time setup screen. I’m currently trying working back through it to see if it’ll work after the BIOS update; failing that I’m going to reinstall from scratch again and try to retrace my steps with better notetaking.

Alright, I did a fresh reinstall with 4.2.2-rc1 as a hail mary and I’m getting exactly the same behavior.

By default, changing nothing about the boot (for a control):

  • It hangs for ~2 minutes on a black screen before popping up the grey-background Qubes logo
  • After maybe another minute it shows the LUKS decrypt prompt, then almost instantly without me even entering the password the screen goes black.
  • I can then enter the password, which takes me to the black screen+mouse, but then the machine isn’t rebooting, it just sits there, screen black. (Or it hangs for so long I don’t have the patience to wait for it.)

Removing quiet and adding plymouth.enable=0:

  • Those ~2 minutes it spends hanging are revealed to be that pciback line from before (wow this is getting really annoying)
  • I get a whole bunch of logs scrolling by too fast to read (and wow I wish my phone camera wasn’t hilariously awful, maybe I’d be able to take a video and read it back)
  • I end up at a nongraphical LUKS prompt where I can put in my password.
  • Eventually I end up back at the initial setup screen; if I try to go through it I hit an error about the default logical volume already existing.
  • Then I boot into Qubes, and everything… seems to work fine, though I’m hardly testing exhaustively
  • When I shut down it hangs indefinitely on a black screen.

Those changes, then adding qubes.skip_autostart:

  • Same ~2 minute hang
  • Nongraphical LUKS prompt
  • Dropped straight into normal (again seemingly functional) Qubes
  • Same issue on shutting down (I see the usual “the system is shutting down NOW!” console message, but then the laptop itself just… doesn’t… turn off.)

So it sure seems like one of the autostarted qubes is causing problems. How can I dig into which one? Is there a way to bifurcate em, try autostarting only half? Or look at logs from prior boots? (I promise I did look myself, but I’m having a ton of trouble navigating Qubes’ docs.)

It should be an issue with PCI device attached to a qube, so it’s an issue with either sys-net or sys-usb.
You can try to start them manually after you boot into dom0 with qubes.skip_autostart and see which one will fail to start.
It’s probably an issue with this PCI device 0000:01:00.0:

So you can check which device is it and to which qube is it attached using this command in dom0 terminal:

qvm-pci

Maybe it’s this issue:

Need to see the boot log to check what is causing this issue.
Maybe it’s an issue with RAID. Check the BIOS settings and change storage mode from RAID to AHCI. Check for other RAID configuration there as well.

despite noticing the pci in the log line it didn’t even once occur to me to look into which PCI device that is. moving on–

lspci shows 01:00.0 is my WiFi card (for reference, a MediaTek MT7922, aka AMD RZ616). That’s odd because my WiFi works. And when trying to boot all my qubes manually after logging in, none of them fail to start.

So now I’m suspecting something’s happening out of order: Something in Xen or Qubes is correctly configuring WiFi once it’s running, but that only happens after the pciback and qube autostart bits.

Not a hundred percent sure if that doc is my exact issue, but I’m going to try fiddling with passthrough regardless. (If you’ve got any resources on doing that for pre-boot, I’d appreciate it greatly, I’m not sure where pciback is being triggered)

Oh, and I’ve just realized, now that the machine is successfully booting I can actually pull the logs! Unfortunately even without quiet I’m not seeing anything indicative of an error, just the “not ready after FLR” lines – probably because autostart is off, so nothing is breaking on boot anymore.

And thank you for the very detailed response!

You can get the failed boot log like this if your previous boot failed:

journalctl -b-1 > journal.log

Here -1 is a previous boot log, -2 is the one before that etc.

You can also try to disable sys-usb autostart then boot without qubes.skip_autostart.
Then try to enable sys-usb autostart and disable sys-net/sys-firewall/sys-whonix autostart and boot without qubes.skip_autostart.

I’m… gradually realizing that Qubes is more like a normal Linux system than I thought. Thanks again.

So:

  • In lspci -vv it looks like the WiFi card is reporting that it supports FLReset+, but then FLR isn’t working.
  • Disabling sys-usb, autostart works. Enabling it, I get the reboot loop. (And no pciback wait, weirdly?)
  • I can then start sys-usb… which sends the laptop into a reboot. Looking at the logs for the last boot, there’s an error message at the very end (14.1 KB)

I have no idea how to read that error message, tbh. The only thing that looks meaningful out of the stack trace is screensaver_event_cb (and apparently it’s xss-lock breaking? which… odd.) It’s possible that’s just a coincidence/symptom, but I don’t see any other logs that look like things are erroring. If you’ve got anywhere you think I should look, let me know.

So with sys-net autostart enabled and sys-usb autostart disabled you can boot successfully?
Then try to enable the sys-usb autostart, remove all PCI devices from sys-usb and then try to attach the PCI USB controllers to sys-usb one by one and boot with them to figure out if this issue is caused by some specific USB controller.
Also maybe some internal USB device should be present in dom0 during boot for it to work properly.

Seems to be some issue with your GPU, but I’m not sure if it’s related to sys-usb or not.
Do you have the same error message in the normal boot log?
Maybe it’s an issue with your GPU firmware:

Yep, with sys-net autostarting, things work fine. With sys-usb autostarting, reboot. When I manually start sys-usb, reboot. I’ve just gone through the PCIe devices in sys-usb, and…

I have six PCIe devices, c1:00.{3,4} and c3:00.{3,4,5,6}. They’re all USB controllers; 5 and 6 are also labelled as “Thunderbolt”. After some experimentation, it seems like c3:00.4 is the issue – if it’s attached to sys-usb and I boot the VM, my laptop reboots. If not, sys-usb boots fine and adds devices as expected. if only c3:00.4 is attached, reboot.

As far as I can tell, c3:00.4 is exactly identical to c3:00.3. So this could be a case of faulty hardware, but for now I can just remove c3:00.4 from sys-usb and keep on with getting Qubes fully functional (notably, fixing poweroffs and the pciback stuff.) At some point I’ll test whether this is the expansion slot itself or the card plugged into it, but for now I’ve got other weekend plans.

Thanks again for your help! I’ll probably be back to it next weekend.

You can boot in dom0 without sys-usb autostart so your USB controllers are attached to dom0 and check if there are any USB devices connected to c3:00.4 USB controller. Maybe some USB device is related to this issue.
Maybe you’ll need to disable USBGuard in dom0 as well to check this USB device. For this you’ll need remove usbcore.authorized_default=0 from GRUB kernel command line options during boot.