AMD PSP not restarting correctly

DISCLAIMER: Yes, I know PSP might not really something you want running, but at the moment, it’s breaking my machine…

I have tried to suspend and resume my GPD Win Max.

xscreensaver show up and I can move the cursor for about 6-8 seconds, then it either freezes, or the screen goes black, and I am forced to hard reset the machine.

journalctl
journal.log (301.9 KB)

Any help would be greatly appreciated.

From the logs we see you’re using a RENOIR APU (so do I, and I confirm suspend/resume not working on Qubes, while it is working with a plain Linux kernel), but I can’t see the exact model (4800H here).

In your logs I see a suspend being triggered, with a big warning in switch_mm_irqs_off(). Then on resume the PSP does not work. I’d suggest looking into the first symptom first. Will check my own logs when I get some time :slight_smile:

As for the PSP, it is required e.g. to verify the authenticity of your GPU firmware, you can’t get amdgpu to initialize on recent hardware if it’s not running.

1 Like

Correct.

It’s advertised as a 4800U.

It appears to sleep correctly (as in, the fans turn off, and the LED pulses slowly, indicating it’s in sleep mode).

I can move the cursor for about 6-8 seconds, before the screen either goes blank, or the cursor freezes (but the fans stay on).

This is my first AMD machine. I’ll do my best, but any help would be massively appreciated :slight_smile:

I had a feeling it was something like that…

Checked my logs, and we sure look in the same boat:

Dec 23 00:34:30 dom0 kernel: Disabling non-boot CPUs ...
Dec 23 00:34:30 dom0 kernel: ------------[ cut here ]------------
Dec 23 00:34:30 dom0 kernel: WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:522 switch_mm_irqs_off+0x3c5/0x400
Dec 23 00:34:30 dom0 kernel: Modules linked in: loop snd_seq_dummy snd_hrtimer ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vfat fat intel_rapl_msr msi_wmi sparse_keyma>
Dec 23 00:34:30 dom0 kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.5-1.fc32.qubes.x86_64 #1
Dec 23 00:34:30 dom0 kernel: Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDK/MS-17FK, BIOS E17FKAMS.117 10/29/2020
Dec 23 00:34:30 dom0 kernel: RIP: e030:switch_mm_irqs_off+0x3c5/0x400
Dec 23 00:34:30 dom0 kernel: Code: f0 41 80 65 01 fb ba 01 00 00 00 49 8d b5 60 23 00 00 4c 89 ef 49 c7 85 68 23 00 00 00 1e 08 81 e8 80 f5 08 00 e9 15 fd ff ff <0f> 0b e8 34 fa ff ff e9 ad>
Dec 23 00:34:30 dom0 kernel: RSP: e02b:ffffc900400d7eb0 EFLAGS: 00010006
Dec 23 00:34:30 dom0 kernel: RAX: 000000001441e000 RBX: ffff888155670000 RCX: 0000000000000040
Dec 23 00:34:30 dom0 kernel: RDX: ffff8881003027c0 RSI: 0000000000000000 RDI: ffff88809441e000
Dec 23 00:34:30 dom0 kernel: RBP: ffffffff829d9240 R08: 0000000000000000 R09: 0000000000000000
Dec 23 00:34:30 dom0 kernel: R10: 0000000000000008 R11: 0000000000000000 R12: ffff8881050a9dc0
Dec 23 00:34:30 dom0 kernel: R13: ffff8881003027c0 R14: 0000000000000000 R15: 0000000000000001
Dec 23 00:34:30 dom0 kernel: FS:  0000000000000000(0000) GS:ffff888155640000(0000) knlGS:0000000000000000
Dec 23 00:34:30 dom0 kernel: CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
Dec 23 00:34:30 dom0 kernel: CR2: 0000593b416684e8 CR3: 0000000002810000 CR4: 0000000000050660
Dec 23 00:34:30 dom0 kernel: Call Trace:
Dec 23 00:34:30 dom0 kernel:  <TASK>
Dec 23 00:34:30 dom0 kernel:  switch_mm+0x1c/0x30
Dec 23 00:34:30 dom0 kernel:  idle_task_exit+0x55/0x60
Dec 23 00:34:30 dom0 kernel:  play_dead_common+0xa/0x20
Dec 23 00:34:30 dom0 kernel:  xen_pv_play_dead+0xa/0x60
Dec 23 00:34:30 dom0 kernel:  do_idle+0xd1/0xe0
Dec 23 00:34:30 dom0 kernel:  cpu_startup_entry+0x19/0x20
Dec 23 00:34:30 dom0 kernel:  asm_cpu_bringup_and_idle+0x5/0x1000
Dec 23 00:34:30 dom0 kernel:  </TASK>
Dec 23 00:34:30 dom0 kernel: ---[ end trace 81338147c2a10edc ]---
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 1 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 2 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 3 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 4 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 5 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 6 is now offline
Dec 23 00:34:30 dom0 kernel: smpboot: CPU 7 is now offline

up till there it seems to be the suspend (not that familiar with suspend logs, but “is now offline” definitely hints to the resume not being started yet, as is play_dead). And then the resume:

Dec 23 00:34:30 dom0 kernel: ACPI: PM: Low-level resume complete
Dec 23 00:34:30 dom0 kernel: ACPI: EC: EC started
Dec 23 00:34:30 dom0 kernel: ACPI: PM: Restoring platform NVS memory
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: Uploading Xen processor PM info
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU2
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU4
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU6
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU8
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU10
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU12
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU14
Dec 23 00:34:30 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU16
Dec 23 00:34:30 dom0 kernel: Enabling non-boot CPUs ...
Dec 23 00:34:30 dom0 kernel: installing Xen timer for CPU 1
Dec 23 00:34:30 dom0 kernel: cpu 1 spinlock event irq 67
Dec 23 00:34:30 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Dec 21 17:09:46 dom0 kernel: ACPI: \_SB_.PLTF.P001: Found 3 idle states
Dec 21 17:09:46 dom0 kernel: ACPI: FW issue: working around C-state latencies out of order
Dec 21 17:09:46 dom0 kernel: CPU1 is up
...

Hypervisor error and [Firmware Bug] also hint that some things are not straight.
No amdgpu on this boot to see a PSP issue, though. And I never got to be able to move the mouse, even on prior atempts: I always only get a black screen on resume.

There must be a way to flush this on wake, surely….

I’ve been looking, and haven’t been able to get it done.

@yann, I have done some research, and came across this:

It might be useful. I will give it a try and report back my findings :slight_smile:

There has been a number of news about power management fixes in the previous weeks, the latest being AMD Prepares Linux Fix For Some Laptops Not Resuming From s2idle Suspend - Phoronix. It may be that upcoming kernels will help.

Digging Phoronix I stumbled on this article, which though old may be interesting to us (not read through yet, though): Awkward Linux Power Management With Xen - Phoronix

1 Like

I have attached here a journal log of the same machine running Ubuntu 21.10 with kernel 5.13.0-19-generic, in which wake from suspend seems to work perfectly.

ubuntu-21.10-gpd-win-max-sleep-wake.log (575.8 KB)

Apologies if it’s a little long. Also, for anyone curious after seeing the logs, my IPv6 LAN subnet starts with 1337:beef:7ac0 (1337 BEEF TACO), and there are others on other subnets, because hey, why not!

So, it is possible to make our AMD laptops wake from suspend properly :slight_smile:

Next step, getting Qubes OS to do it!

@yann, would this be of any use?

It works very well with ProxMox, and with a bit of tweaking (which I’m still in the middle of), could likely be ported to Xen/Qubes…

Hi, I’m on R5 5600U, and I encounter the same suspend/resume issue with you.

When I use kernel 5.16.15-1 or 5.10.106-1 in dom0, with kernel parameters “iommu=soft” and “mem_sleep_default=deep”(whatever, I can’t recall it precisely), the suspension seems to work fine, but resume never brings it back to normal.

If I don’t login in tty1(lightdm) and switch to tty2, then echo mem into /sys/power/state to trigger suspend, I can come back to tty2. However, the system doesn’t work perfectly. If I try to switch back to tty1, the screen will turn black. If I type something into tty2, occasionally there will be some letters not showing correctly, showing "_"instead. Possibly because of some frame buffer issues.

However, in tty2 the system is basically “usable” (of course not in the form of daily, graphical use). Commands such as qvm-start or xentop is fully responsive.

Dmesg shows that , although my CPUs acclaim “firmware bug”, “mwait c-state 0x0 isn’t supported on this HW”, they are brought up by the system. And PSP along with amdgpu is working.

That sounds identical to what I’m experiencing.

Filed a bug report:

@alzer89 that sounds neat, good catch!

I’ve not been back to the passthrough front yet (spent so much those last months that I thought the software side of things was needing some love) so incidently I have not yet run into the GPU reset problems yet.

From the little I’ve been following in the meantime, I have great hopes that some work amgpu/dc has had recently will finally unlock us. “Just” need to take the time to dig into this anew.

So far it’s been narrowed down to:

  • an issue with the way Xen interacts with the dom0 kernel (most likely)
  • and issue with the amdgpu kernel module
  • the GPU wanting to “cryptographically” verify itself before powering back up
  • an issue with Xorg being run on top of Xen

Resume from sleep works perfectly fine with anything not running on top of Xen, so my guess is it’s a Xen thing…

The kernel module ccp doesn’t seem to initialise on dom0 boot.

Xorg also coredumps because of “ring gfx timeout”.

We’ll get it solved eventually :slight_smile: