DISCLAIMER: Yes, I know PSP might not really something you want running, but at the moment, it’s breaking my machine…
I have tried to suspend and resume my GPD Win Max.
xscreensaver show up and I can move the cursor for about 6-8 seconds, then it either freezes, or the screen goes black, and I am forced to hard reset the machine.
From the logs we see you’re using a RENOIR APU (so do I, and I confirm suspend/resume not working on Qubes, while it is working with a plain Linux kernel), but I can’t see the exact model (4800H here).
In your logs I see a suspend being triggered, with a big warning in switch_mm_irqs_off(). Then on resume the PSP does not work. I’d suggest looking into the first symptom first. Will check my own logs when I get some time
As for the PSP, it is required e.g. to verify the authenticity of your GPU firmware, you can’t get amdgpu to initialize on recent hardware if it’s not running.
Checked my logs, and we sure look in the same boat:
Dec 23 00:34:30 dom0 kernel: Disabling non-boot CPUs ...
Dec 23 00:34:30 dom0 kernel: ------------[ cut here ]------------
Dec 23 00:34:30 dom0 kernel: WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522 switch_mm_irqs_off+0x3c5/0x400
Dec 23 00:34:30 dom0 kernel: Modules linked in: loop snd_seq_dummy snd_hrtimer ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat intel_rapl_msr msi_wmi sparse_keyma>
Dec 23 00:34:30 dom0 kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.5-1.fc32.qubes.x86_64 #1
Dec 23 00:34:30 dom0 kernel: Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDK/MS-17FK, BIOS E17FKAMS.117 10/29/2020
Dec 23 00:34:30 dom0 kernel: RIP: e030:switch_mm_irqs_off+0x3c5/0x400
Dec 23 00:34:30 dom0 kernel: Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 00 1e 08 81 e8 80 f5 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad>
Dec 23 00:34:30 dom0 kernel: RSP: e02b:ffffc900400d7eb0 EFLAGS: 00010006
Dec 23 00:34:30 dom0 kernel: RAX: 000000001441e000 RBX: ffff888155670000 RCX: 0000000000000040
Dec 23 00:34:30 dom0 kernel: RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff88809441e000
Dec 23 00:34:30 dom0 kernel: RBP: ffffffff829d9240 R08: 0000000000000000 R09: 0000000000000000
Dec 23 00:34:30 dom0 kernel: R10: 0000000000000008 R11: 0000000000000000 R12: ffff8881050a9dc0
Dec 23 00:34:30 dom0 kernel: R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
Dec 23 00:34:30 dom0 kernel: FS: 0000000000000000(0000) GS:ffff888155640000(0000) knlGS:0000000000000000
Dec 23 00:34:30 dom0 kernel: CS: 10000e030 DS: 002b ES: 002b CR0: 0000000080050033
Dec 23 00:34:30 dom0 kernel: CR2: 0000593b416684e8 CR3: 0000000002810000 CR4: 0000000000050660
Dec 23 00:34:30 dom0 kernel: Call Trace:
Dec 23 00:34:30 dom0 kernel: <TASK>
Dec 23 00:34:30 dom0 kernel: switch_mm+0x1c/0x30
Dec 23 00:34:30 dom0 kernel: idle_task_exit+0x55/0x60
Dec 23 00:34:30 dom0 kernel: play_dead_common+0xa/0x20
Dec 23 00:34:30 dom0 kernel: xen_pv_play_dead+0xa/0x60
Dec 23 00:34:30 dom0 kernel: do_idle+0xd1/0xe0
Dec 23 00:34:30 dom0 kernel: cpu_startup_entry+0x19/0x20
Dec 23 00:34:30 dom0 kernel: asm_cpu_bringup_and_idle+0x5/0x1000
Dec 23 00:34:30 dom0 kernel: </TASK>
Dec 23 00:34:30 dom0 kernel: ---[ end trace 81338147c2a10edc ]---
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 1 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 2 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 3 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 4 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 5 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 6 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 7 is now offline
up till there it seems to be the suspend (not that familiar with suspend logs, but “is now offline” definitely hints to the resume not being started yet, as is play_dead). And then the resume:
Dec 23 00:34:30 dom0 kernel: ACPI: PM: Low-level resume complete
Dec 23 00:34:30 dom0 kernel: ACPI: EC: EC started
Dec 23 00:34:30 dom0 kernel: ACPI: PM: Restoring platform NVS memory
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: Uploading Xen processor PM info
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU2
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU4
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU6
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU8
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU10
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU12
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU14
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU16
Dec 23 00:34:30 dom0 kernel: Enabling non-boot CPUs ...
Dec 23 00:34:30 dom0 kernel: installing Xen timer for CPU 1
Dec 23 00:34:30 dom0 kernel: cpu 1 spinlock event irq 67
Dec 23 00:34:30 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Dec 21 17:09:46 dom0 kernel: ACPI: \_SB_.PLTF.P001: Found 3 idle states
Dec 21 17:09:46 dom0 kernel: ACPI: FW issue: working around C-state latencies out of order
Dec 21 17:09:46 dom0 kernel: CPU1 is up
...
Hypervisor error and [Firmware Bug] also hint that some things are not straight.
No amdgpu on this boot to see a PSP issue, though. And I never got to be able to move the mouse, even on prior atempts: I always only get a black screen on resume.
I have attached here a journal log of the same machine running Ubuntu 21.10 with kernel 5.13.0-19-generic, in which wake from suspend seems to work perfectly.
Apologies if it’s a little long. Also, for anyone curious after seeing the logs, my IPv6 LAN subnet starts with 1337:beef:7ac0 (1337 BEEF TACO), and there are others on other subnets, because hey, why not!
So, it is possible to make our AMD laptops wake from suspend properly
Hi, I’m on R5 5600U, and I encounter the same suspend/resume issue with you.
When I use kernel 5.16.15-1 or 5.10.106-1 in dom0, with kernel parameters “iommu=soft” and “mem_sleep_default=deep”(whatever, I can’t recall it precisely), the suspension seems to work fine, but resume never brings it back to normal.
If I don’t login in tty1(lightdm) and switch to tty2, then echo mem into /sys/power/state to trigger suspend, I can come back to tty2. However, the system doesn’t work perfectly. If I try to switch back to tty1, the screen will turn black. If I type something into tty2, occasionally there will be some letters not showing correctly, showing "_"instead. Possibly because of some frame buffer issues.
However, in tty2 the system is basically “usable” (of course not in the form of daily, graphical use). Commands such as qvm-start or xentop is fully responsive.
Dmesg shows that , although my CPUs acclaim “firmware bug”, “mwait c-state 0x0 isn’t supported on this HW”, they are brought up by the system. And PSP along with amdgpu is working.
I’ve not been back to the passthrough front yet (spent so much those last months that I thought the software side of things was needing some love) so incidently I have not yet run into the GPU reset problems yet.
From the little I’ve been following in the meantime, I have great hopes that some work amgpu/dc has had recently will finally unlock us. “Just” need to take the time to dig into this anew.