Diagnosing stalled black screen at resume

Hi,

My newly laptop pavilion 15-ecxx works fine with Qubes OS 4.1. It runs in UEFI mode only (no legacy available), has a nvidia GTX 1650 GPU, but freezes (after suspend) on resume.

Suspend seems to trigger well (blinking power led, no more leds on NIC nor SSD activity led, screen lit off), but resume fails (erratic : sometimes the lock screen of Xfce prompts, but no way to interact with it, most of the times the screen light on but remains black. SSD led usually blinks anew, as well as one of the NIC led too, but the machine is not up on the network anew).

With Suspend/resume troubleshooting | Qubes OS I tested adding mem_sleep_default=deep in grub.cfg, it did not change the results. I’ve also checked the `Xfce power manager’ tool, set all parameters to ‘suspend’ for the moment.

In How to troubleshoot wake from suspend I looked into journalctl but found nothing more relevant … though in dmesg (dom0) I noticed a snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible) error message : it may be related to suspend, but I haven’t been successful investigating it until now :worried:.

Did not find any hint in Suspend issues on ThinkPad E495 nor in Suspend resume issues (gen1 thinkpad x1 yoga), maybe did I miss some solution in the forum ?

Anyhow, as there’s no /var/log/syslog in Qubes OS, I’d like any help to know what is the best place to investigate/try options :

  • dmesg
  • journalctl
  • UEFI
  • /var/log/lightdm/lightdm.log
    ?

Suspension problem is an old problem in Qubes OS. Usually old thinkpad laptops are better at suspension.

I am also having a hard time trying to troubleshoot the resuming problems on my machine. The following may be helpful.

  1. Your cpu model is important for troubleshooting.
  2. Check out whether your laptop supports a S3 sleep or a S0 sleep, and make a decision on which type of sleep you want.
  3. If S0, it does not seem Qubes OS will support this quickly.
  4. If S3 (the standard suspension), then there are several places to see the log files:
  • dom0 journalctl -r to see the dom0 kernel log
  • dom0 qube manager log to see the hypervisor log - both dom0 kernel log and hypervisor log are important; comparing the logs may prove to be very useful.
  • if you successfully resumed but some of your VM malfunctions, you can see the VM log in qube manager as well
  • I am not sure which additional logs are helpful; I don’t think that every laptop has a BIOS log - thinkpad BIOS, for example does not seem to have logging files
  1. When you are reading the logs, pay attention to the time. Find out which lines are before suspension and which lines are after resuming.
  2. There are many ways of interacting. Ctrl-Alt-F2, for example, can skip some of the problems if problem happens in Xorg. You can try to find out whether it is the whole computer that is down, or only your screen has some problems.
    How to troubleshoot wake from suspend - #10 by alzer89
    This has mentioned one way to try to distinguish whether it is screen or the whole system is down. There may be other methods, for example playing music when suspending; doing a lot of calculation (to make CPU busy) when suspending and observe the fan to see whether the CPU is idle or busy; while true;do sudo sh -c "echo -e '\a' >> /dev/console"; sleep 1;done in dom0 if you cannot even play music.

The current suspend/resume trouble shooting documentation does not seem to be up to date.

Talking of snd_hda_intel, is your audio working?

1 Like

Hi,

Thanks a lot for these explanations. I’m going on trying to find answers.

  1. The cpu is a Ryzen 5 4600H (should be great when I’ll be able to use it).
  2. The machine has been tested with Ubuntu before, where suspend worked normally. I can see S3 support in the dom0 logs, hereafter, and the option is explicit in the BIOS.
  3. N/A so.
  • In journactl, I can find the ACPI table, I can see some errors for ACPI at boot :
    May 05 16:38:43 dom0 kernel: ACPI BIOS Error (bug): Failure creating named object [\SMIB], AE_ALREADY_EXISTS (20200925/dsfield-637)
    May 05 16:38:43 dom0 kernel: ACPI Warning: NsLookup: Type mismatch on SMIB (Integer), searching for (RegionField) (20200925/nsaccess-696)
    May 05 16:38:43 dom0 kernel: ACPI BIOS Error (bug): Could not resolve symbol [_SB.PCI0.GPP1.WLAN], AE_NOT_FOUND (20200925/dswload2-162)
    May 05 16:38:43 dom0 kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20200925/psobject-220)
    May 05 16:38:43 dom0 kernel: ACPI: Skipping parse of AML opcode: OpcodeName unavailable (0x0010)

but also an auspicious line :
May 05 16:38:43 dom0 kernel: ACPI: (supports S0 S3 S5)

The day before, when I again tried to have logs while suspending, I got
May 04 01:31:49 dom0 kernel: ACPI: Preparing to enter system sleep state S3
then 6 lines (cpus) :
May 04 01:31:49 dom0 kernel: xen_acpi_processor: (PXX): Hypervisor error (-19) for ACPI CPU2
then an error repeated 6 times (the same is present at boot ) :
May 04 01:31:49 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
May 04 01:31:49 dom0 kernel: ACPI: _SB_.PLTF.P001: Found 3 idle states
May 04 01:31:49 dom0 kernel: ACPI: FW issue: working around C-state latencies out of order
and finally :
May 04 01:31:49 dom0 kernel: ACPI: Waking up from system sleep state S3
May 04 01:31:49 dom0 kernel: ACPI: EC: interrupt unblocked
May 04 01:31:49 dom0 kernel: ACPI: EC: event unblocked
(no more ACPI messages after that, and I hard shutdown the non-responding pc)

  • I don’t find errors in the qube manager log, just a line with “(XEN) parameter “no-real-mode” unknown!” at each boot.

  • I don’t see a malfunctionning VM either.

  • I don’t have bios logs on that laptop (The BIOS of this laptop is awful, the weakest I’ve ever seen -because it was sold as “FreeDOS only” laptop ?? Never again).

  1. Yes, sure, I’m trying to identify patterns so that I can more quickly identify suspend time and activities before and after.

  • Yes, I tested Ctrl+Alt+F2 first, but it does not provide any virtual console (well, maybe, but as the screen remains stalled, I can’t see any change). Whereas in working environment (i.e. after boot and logging in), virtual consoles work fine.
    Yes, great, thank you, I absolutely lack common sense (I had just though about pinging the machine on the network) : playing music (from streaming) is definitively an help, as the music comes up after suspend … for a few seconds !! It looks like the browser resumes playing its cache, but cannot reload the following packets, so the VM must be frozen : I’m looking at these logs (and so, yes btw, audio is working fine).

  • Well, I’ve just tried a suspend : odd new behavior, as after suspend the live streaming played normally (more than a minute, I don’t think it was only cache), and the cursor of the mouse was present! And movable ! Though the screen remained black, the cursor was locked in a pane at the center of the screen (I assumed it was the unlocking pane after locked screen, so I entered the password, but it did nothing, and I can’t understand with this VM … has nothing in its log about the suspend!! It’s really weird. Btw changing Virtual Console worked, once, it did cut the playing music, but provided no VC (screen remained stalled), and I could go back to the default VC (or I wasn’t able to confirm I came back to it).

So maybe I should go to the X system configuration ?

I also forgot to check dmesg : it seems to point to a known nouveau bug :

dom0 kernel: nouveau 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)
dom0 kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
dom0 kernel: nouveau 0000:01:00.0: fb: Falcon mem scrubbing timeout
dom0 kernel: nouveau 0000:01:00.0: fb: VPR still locked after scrub!
dom0 kernel: nouveau 0000:01:00.0: fb: init failed, -5
dom0 kernel: nouveau 0000:01:00.0: init failed with -5
dom0 kernel: nouveau: systemd-udevd[399]:00000000:00000080: init failed with -5
dom0 kernel: nouveau: DRM-master:00000000:00000000: init failed with -5
dom0 kernel: nouveau 0000:01:00.0: DRM: Client resume failed with error: -5
dom0 kernel: nouveau 0000:01:00.0: DRM: resume failed with: -5

So, maybe rather a problem with nouveau driver ?