Allright, thought I’ll give it an update.
My goal is to get this working as a lab machine. I like the idea of a desktop that can easily and securely spin up machines as needed. Suspend still isn’t working as of now. Around 6-8h in reading and debugging so far (I’m a beginner). If you’re a beginner in the same situation that doesn’t want to waste time on debugging things but want a system that works out of the box for the intended purpose I’d suggest to walk away and spend time looking for some other solution. Either that or opt for hardware that has proven to be reliable without problems.
Here is what I did so far:
1) Get the system to a [Minimal Configuration] state
Why: Reduce search scope. Eliminate as many potential sources of the suspend/resume problem as possible. The less potential sources you have, the quicker you’ll solve it.
In my case that meant physically removing the graphics card and unplugging a screen as the first step.
HP Victus 15L
- *64GB RAM (Crucial 64 GB , DDR4-RAM , 3200 MHz , 288-Pin DIMM , DDR4-3200 (PC4-25600))
- AMD RYZEN 7 5600G
AMD Radeon RX 6600XT
- External screen 1
External screen 2
- Reno2 mainboard
- 2 TB WD Black SN770 (M.2 - it’s fast)
Any USB device that’s not needed
After that I simply reinstalled Qubes from scratch (4.1.2 - 5.15.94-1.qubes.fc32.x86_x64) on the machine to get a clean start. Once booted up I only opened a single bash window to keep started processes at a minimum too before I sent it to suspend through the GUI menu.
Result: Suspend/resume problem persists even in Minimal physical Configuration state. Keyboard LEDs are not reacting after resume, mouse seems dead too. Fans are spinning though.
Learnings: I don’t need to debug anything related to the RX 6600XT driver for now and can put that aside.
1.1) Verify if you really are in a [Minimal Configuration] state
Why: Unplugging hardware alone won’t get you to a minimal configuration. There is software too to take care of.
- go with minimal config, turn off drivers like USB, AGP you don’t really need
- turn off APIC and preempt
- use ext2. At least it has working fsck. [If something seems to go wrong, force fsck when you have a chance]
- turn off modules
- use vga text console, shut down X. [If you really want X, you might want to try vesafb later]
- try running as few processes as possible, preferably go to single user mode.
- due to video issues, swsusp should be easier to get working than S3. Try that first.
When you make it work, try to find out what exactly was it that broke suspend, and preferably fix that.
Source/Credits: Pavel Machek pavel@ucw.cz / https://www.kernel.org/doc/Documentation/power/tricks.txt
To cross 2 and 5 off that list above and put the system into S3 deep sleep on suspend here is what I did:
sudo nano /etc/default/grub
GRUB_CMDLINE_LINUX=“rd.luks.uuid=luks-fc438d0c-2755-4faf-8a4c-dd249cbe7e3c rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles rd.driver.pre=btrfs rhgb quiet noapic acpi=off mem_sleep_default=deep vga=0”
Note: Apparently there are better ways than vga=0 to do this → GRUB/Tips and tricks - ArchWiki (Sidenote: Outdated documentation seems to be a problem when troubleshooting. Make sure you find recent documentation.)
-
Apply the config (you need to specify the output path of grub.cfg with -o, if you don’t it will just print the new config to the terminal but not actually change it. Cheeck here for more information.)
sudo grub2-mkconfig -o /boot/efi/EFI/qubes/grub.cfg
-
Restart system
Result (in my case): Won’t boot. If you’re stuck in a grub boot cycle you can edit the grub config during startup by pushing e in the selection. After manually deleting noapic from the config posted above the system started up after hitting F10 but the GUI was very slow and laggy (with acpi=off). Suspend didn’t work, the screen just froze. Removing the acpi=off option while leaving noapic there resulted in being stuck in a failed boot cycle again.
-
Check if suspend/resume now works //for me it didn’t
-
Make 1 modification to the settings at step 1 and try again (Trial & error)
There are countless things you can change. On my machine which uses an AMD graphics card for example:
amdgpu.gpu_recovery=1 //Activate a mechanism that whenever there is a GPU timeout detected, it will automatically attempt to reset the GPU and bring it back up.
amdgpu.dpm=0 //Dynamic Power Management
amdgpu.dc=0 //Display Core
...
To do:
I haven’t tried all of the above yet. Still need to get a grasp of some things.
- Learn how to turn off drivers that are not needed (Estimate to learn it: 1-2h)
- Learn how to turn off APIC and preempt (Estimate to learn it: 1-2h)
- Learn how to turn off modules (and which ones) (Estimate to learn it: 1h)
2) Learn more about how suspend works
Why: To identify what I need to look at / what exactly my system is doing when I send it to suspend state. Also to get the vocabulary needed to conduct google searches.
Learnings:
Suspend-to-ram
(s2ram) seems to be a very common problem on Linux since a long time. Even Linus wrote about it in his Tutorial “How to get s2ram working” back in 2006. Graphics hardware in particular seems to be the biggest source of problems.
- The Linux kernel knows various sleep states (check diagram below). It’s probably good to know which one you want to achieve in order to check your settings / narrow down the scope of your troubleshooting efforts. I want to achieve S3 (suspend to ram) and not hibernation (which would require additional configuration and I’m simply too lazy to do that).
Result:
Note: Diagram above can be edited. Download and rename from .log to .drawio, then you can import it in drawio.
Kernel_sleep_states.drawio.log (6.1 KB)
Based on what I learned I verified that /sys/power/mem_sleep is set to [deep] which should signal the kernel that we want to suspend to ram (S3) when we suspend.
cat /sys/power/mem_sleep
s2idle [deep] //brackets indicate which option is active
I guess suspend itself isn’t so much the issue (system goes to sleep as expected) but resuming is.
3) Next steps
- Do some research and see if I can find similar systems that have a desktop, could be used as home lab and support suspend out of the box (Path of least pain atm)
- Listen to inputs from the community
- Make it work on the current hardware platform: Learn how to set debug levels, gather logs and how to view/interpret them (Goal: Narrow down search scope. Figure out precisely where things go wrong. Estimate that I need 2h or so to read, understand, apply that)
Make it work on the current hardware platform: Bruteforce my way through some kernels if I can’t figure it out myself (Trial & Error)
Suspend-To-Ram Issues and graphics card bugs seem to be very common. Why waste time chasing after a bug if somebody else might has already fixed it in a different kernel version?
- Make it work on a supported platform: Once I have enough $$$ to spare on some new lab hardware I might do that
- Make it work on the current hardware platform: Get up to speed with ACPI, DSDT, decompiling and patching things myself (Path of max. pain, I’d expect many many hours spent on that before I get somewhere. Not even sure if I could realistically achieve that, it’s not really my cup of tea)
- Something entirely else
Might update in case I made progress.
If this thread doesn’t get an update it means that I have spent more than 16h trying to get suspend to work on this particular system but didn’t succeed and moved on. I’m currently at around 8h in.
Cheers.
===========================================================
Some 2h later… (at around 10h trying to get this to work)
Decided to give step 4 above a go → Bruteforcing through some kernels. Here is how I went about that:
- Created a list of kernel versions what appeared to be good candidates that could solve the suspend/resume problem. Did some research and came up with this:
5.15.94-1 //(Current) Doesn't work
5.16.3 //works perfectly fine for me
6.0.7 //@BenT suggested: Install this kernel in Dom0 (works in both 4.1 and 4.2)
6.0.8 //works properly (tomz17)
6.0.12 //Downgrading to this version fixed the issue (Ideapad 3 15ALC6 - CPU: AMD Ryzen 5 5500U, with integrated graphics)
6.1.1 //Breaks suspend on AMD laptop (Ideapad 3 15ALC6 - CPU: AMD Ryzen 5 5500U, with integrated graphics)
6.1.19-1-lts //This (suspend problem) occurs on my pc recent weeks. have to reboot and all my tasks are being terminated.
6.1.22-1-lts //Breaks hibernation on AMD Ryzen 4700U CPUs
6.2.2-arch2-1 //Breaks suspend
I ended up with a list of four kernels that seemed like good candidates. Idea being to save time and focus on kernels that have a higher chance of potentially solving the problem / not waste time on kernels that are known for having a lot uf suspend/resume problems related to my hardware.
5.16.3 //works perfectly fine for me
6.0.7 //@BenT suggested: Install this kernel in Dom0 (works in both 4.1 and 4.2)
6.0.8 //works properly (tomz17)
6.0.12 //Downgrading to this version fixed the issue (Ideapad 3 15ALC6 - CPU: AMD Ryzen 5 5500U, with integrated graphics)
- Created a folder for each Kernel version that I was going to download
- Downloaded all .rpm from the list above into the corresponding folder
kernel-latest-*.qubes.x86_64.rpm
kernel-latest-devel-*.qubes.x86_64.rpm
kernel-latest-qubes-vm-*.qubes.x86_64.rpm
- Now that I had four folders with kernel files I packed them up into a good_kernels.tar archive
- Followed this guide to get the good_kernels.tar to dom0
- Unpacked the good_kernels.tar → now I had a folder structure with the good kernels
- Installed the lowest version (5.16.3) //which was slightly higher than the current kernel
//cd’ed into the good_kernels\kernel_5.16.3 folder, then
sudo dnf install -y kernel-latest-*.rpm
- rebooted, once up checked kernel version with
uname -r
and once verified suspended the system
- resumed after 5sec or so, checked if screen wakes up or keyboard LED reacts → In case of failure, install next higher version of the kernel (here 6.0.7), test again, repeat //Qubes automatically selects the latest kernel version, thus this approach only works if you go from low kernel version to high kernel version
Unfortunately no luck with the “good kernels” I picked.
Didn’t notice any different behavior in any of the kernels.
I’ll try to get my BIOS to the latest version and see if anything changes. It’s super restricted. Can’t do anything in it. No idea how to get into the “Advanced” section either, it’s hidden or completely disabled. Shame on HP.
I’ll leave that for some other day though. Need to get Windows on that machine first to update the BIOS.