Error installing Qubes: Kernel Panic, not syncing, fatal exception

I am having issues trying to install Qubes 4.2.3 on my Thinkpad Z13 Gen 1 from USB.

First I enabled AMD SVM in BIOS. Then I updated BIOS and firmware to the latest versions with fwupd from an existing Linux installation. I downloaded and checked Qubes 4.2.3 ISO. Copied to USB stick with dd. Boot from USB.

Boot menu works, but if I choose either “Install Qubes” or even “check this medium”, the screen goes black and after a couple of seconds I receive the screenshot below (sorry for the crappy quality). Then, after a few seconds, the laptop reboots.

I tried with two different USB sticks, 64 and 128 GiB, same problem.

Help pls?

Thanks!

Could you try and add:

acpi=off

to the kernel line in grub (the one that has quiet)? – it’s just a guess to rule out some possible causes …

:slight_smile:

It worked, thanks! But the trackpad is not working, I am proceeding with an external mouse. Hopefully it will work after install.

Yes - the acpi=off is a big hammer and it will disable stuff … but it can help as a first step in “could it be related to … ?”.

I’ll suggest searching the forum for other posts/models that mention acpi=off – some of them have “It’s enough to disable XX” or “It works if …”.

:slight_smile:

run memtest86 with all cores for a while (2+h) to see if your hw is faulty.
Do you have ECC?
With big memory bit flips are more likely so you can end up with an error caused by flipped bits.
This is why workstations are better than desktops and laptops to run qubes.

BTW: Does someone know a laptop with ECC RAM support? Maybe with AMD CPU …

I think there have been a lot of ( Lenovo + AMD CPU/GPU + Qubes OS ) resulting in “failure to boot/install” on the forum.

A lot of times, the acpi=off has resulted in “System can boot/install - but build in touchpad/keyboard/… doesn’t work” - so I would be surprised if it was hardware errors … but it’s always a good idea to rule it out.

:slight_smile:

I had some bad ram with single errors that got corrected by ecc and I got warnings for this every few hours. So I searched for the DIMMs which caused it using memtest86 which burns all cores and the RAM at maximum speed.
Do not underestimate bad hardware.

Wait, so you mean that if I install Qubes with acpi=off then even after install I won’t have hardware working? Just to understand what I’m getting myself into :slightly_smiling_face:

I will try to run for 2 hours as you said, but just to understand, what happens if I find faulty bits? Do I have to call in assistance and replace all motherboard, or can I safely disable the faulty blocks?

The installer will copy the settings to the config it writes - but it can be removed manually later. :slight_smile:

The right thing would be to search the forum for topics that mentionacpi=off and see how people have narrowed it down to a single driver or BIOS options.

I’m looking at my own ( Lenovo + AMD CPU/GPU + Qubes OS ) issue again … without much luck … :-/

:slight_smile:

I searched the forum but I could only find issues, not lot of solutions unfortunately. But is this because of the AMD support? That’s really a pity, I think I will have to give up trying with this machine, I cannot afford ending up with a laptop where touchpad/keyboard and maybe wifi don’t work.

I’m not sure about the why or what causes the combination of Qubes OS + Lenovo laptop + AMD CPU/GPU causes that many problems. :-/

I suspect it’s something with Xen - I did try to install Xen while my T14 had a xUbuntu installed … and as I recall, that also failed to start … :-/

Adding to my issues, the WLAN card in my machine should be a “Qualcomm Wi-Fi 6E NFA725A 2x2 AX” (based on the Lenovo information) and it looks to have problems as well: Installation of Qubes on ThinkPad T16 - #44 by apparatus :-/

:slight_smile:

Update: booting from latest kernel with module_blacklist=ucsi_acpi prompted me into a workable (trackpad + keyboard) installer. Now before proceeding I want to make some research on how partitioning works in Qubes. Usually I do:

  • unencrypted /boot/efi
  • unencrypted /boot
  • luks-encrypted /
  • luks-encrypted /home (which is automatically decrypted by a keyfile stored in /root )

But not sure this will work for Qubes, I see that LVM is used by default but I’m not sure what “thin provisioning” is.

Update here

you can remove the memory bars.
First you move the bar from one slot to another to see if the error changes address (then the bar is defective) or if he error has the same address then the mainboard is defective (e.g. address driver or data bus bit are defective. Maybe you have bad/dirty connectors - use flouride based electronics cleaner - the sort that us worst to the environment is the best, inspect with macro lens of telephone to locate traces of force and misuse).

So this is the change and see method is the way of the siemens technican who does not have any clue but changes components to see if the problem travels with the changed component or not. If the error travels then the traveling component is pronounced guilty :slight_smile:

If you have replaceable components go for a healthy RAM module and look for proper cooling. Some Workstations and servers have RAM fans. Look for proper installation of these.
Then there are the other parameters: bad clock speed, clock jitter - disable “spread spectrum” in BIOS - it is there to pass EMC tests with tricks; setup&hold times - use slower timing if changeable in bios, and voltage - e.g. bad caps - you can find the ripple on the plus lines with an oscilloscope and replace the capacitors if the ripple is too much (you need to be a pro for that), or rise the operating voltage for core voltage and memory voltage via bios setting about 2% to 5%. and see if the error is gone. If it is gone and the timing was not agressive then you can be quite sure to have bad capacitors at Vcore and Vram. Measure with oscilloscope or keep the setting and live with a little bit more power drawn.