Dom0 gui gets stuck with second monitor connected

st3f10 · January 1, 2024, 5:39pm

Hello all,
I’m experiencing an issue with my Lenovo notebook when the second monitor is connected through the HDMI port.
Notably, the problem only occurs when the second monitor is connected.
Randomly, the dom0 GUI gets stuck, rendering the machine unusable.
During these occurrences, I cannot select any items on the desktop, and after a few seconds, the mouse freezes.
At this point, the only solution is to force a hardware reset.

I’ve been unable to locate any logs that might report this issue. The only information I’ve found is that ‘journalctl’ logs the following lines when the problem arises:
dic 31 17:56:41 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [CRTC:131:pipe B] flip_done timed out
dic 31 17:56:51 dom0 kernel: i915 0000:00:02.0: [drm] ERROR flip_done timed out
dic 31 17:56:51 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [CRTC:131:pipe B] commit wait timed out

I have attached the ‘journalctl’ for your reference.
journalctl.log (376.5 KB)
Here is the HCL report:
hcl.log (765 Bytes)

As a newcomer to Qubes, I’m seeking your assistance in understanding the issue and guiding me through the troubleshooting process.

Any help would be greatly appreciated.

apparatus · January 1, 2024, 7:21pm

Try to add the pci=nomsi kernel option in dom0 for a test.
Maybe related issues:

github.com/QubesOS/qubes-issues

GUI randomly freezes

opened 09:27AM - 17 Dec 18 UTC

closed 06:10AM - 20 Apr 23 UTC

GammaSQ

T: bug C: other P: default affects-4.1

### Qubes OS version: 4.0  ### Affected component(s): GUI --- ### Steps to reproduce the behavior:  Random. Happened three times by now and every time within ~30 min after boot. Couldn't find any other correlation, apart from the VMs started at that moment, but they are what I always start after bootup. ### Actual behavior: At some point, the GUI freezes but the cursor can be moved around. Nothing reacts, not windows nor xfce-taskbar buttons. Switching to tty2 (Alt+Ctrl+F2) takes very long. When switching back (Alt+Ctrl+F1), again taking very long, it appears the windows actually reacted to what I did before (they are brought to foreground / moved around), but the cursor doesn't move anymore and any keyboard-input is ignored. (i.e. I can't switch back to tty2). ### General notes: I thought this was a performance-issue, so I checked xentop and top in tty2, but there was nothing out of the ordinary. RAM usage was far below maximum every time, CPU usage was in the single-% range. Anything else I could check when it happens again?

github.com/QubesOS/qubes-issues

Screen does not wake up after resume (AMD Ryzen 7 Pro 4750U)

opened 08:36PM - 30 Sep 21 UTC

closed 07:42AM - 16 Apr 23 UTC

isodude

T: bug C: Xen P: major hardware support diagnosed C: power management affects-4.1

### Solved as of linux-firmware-20230123-135.fc32.noarch xen-4.14.5-20.fc32.x8…6_64 kernel-latest-6.2.10-1.qubes.fc32.x86_64 ### Qubes OS release R4.1, kernel 5.14.7-1 (fedora 5.14) (same behavior in lower kernels.) XEN 4.14.3 (build from @marmarek branch) ### Brief summary Laptops does not resume after third sleep/resume cycle. The problem seems to be with ``` [drm] psp command (0x7) failed and response status is (0xFFFF0007) [drm:psp_hw_start [amdgpu]] *ERROR* PSP load tmp failed! ``` It feels like there's a hung process in the amdgpu drivers for some reason. Not sure how to debug this properly, XEN is not giving me much info at all. The problem is visible with X started as well obviously but I try to make the bug surface smaller. ### Steps to reproduce Boot laptop with X disabled, no VMs started. run systemctl suspend three times (and resuming) run reboot to restore system ### Expected behavior Possible to suspend limitless. ### Actual behavior Screen does not wake up on third resume. It's possible to write `reboot` and restart. ### Notes Works well with kernel booted without XEN. [crash.filtered.log](https://github.com/QubesOS/qubes-issues/files/7262316/crash.filtered.log) [crash.filtered.xen.log](https://github.com/QubesOS/qubes-issues/files/7262318/crash.filtered.xen.log) ### Workarounds A bit more testing is needed but I do have sort of stable suspend/resume now. It even survives when everything goes south. There's a bit of tearing, but I'd rather have suspend than tearing. ``` cat << > /etc/X11/xorg.conf.d/50-video.conf Section "Device" Identifier "card0" Driver "amdgpu" Option "AccelMethod" "none" EndSection ``` Compile `xorg-x11-drv-amdgpu` from https://github.com/freedesktop/xorg-xf86-video-amdgpu Run `make install` and install `amdgpu_drv.so` in `/usr/lib64/xorg/modules/drivers` on dom0. For more stability run with kernel cmdline `preempt=none` Do note that e.g. 4k external screen will be royally sluggish. Sometimes the screen turns up black, type in the password anyhow and switch to tty2 and back again / suspend-resume again and it will most likely come to life again. Suspend/resume too fast could lead to instant reboot.

github.com/QubesOS/qubes-issues

MSI problems with `amdgpu` driver on recent kernels

opened 05:57PM - 04 Jan 23 UTC

closed 09:13PM - 20 Nov 23 UTC

neowutran

T: bug C: kernel R: cannot reproduce P: default hardware support C: power management affects-4.1

### Qubes OS release 4.1 ### Brief summary On recent linux kernel (5.6+), t…he driver "amdgpu" start using MSI to do power management at runtime. In later kernel version (5.10+) also seems MSI seems to be used to do other things related link: https://gitlab.freedesktop.org/drm/amd/-/issues/2327 Can this pull request be related to this issue ? https://github.com/QubesOS/qubes-vmm-xen/pull/143 ### Steps to reproduce - Do a GPU passthrough to a VM (AMD GPU) - With kernel <5.6 it will work (if the GPU is supported by this kernel version) - With kernel >5.6 it won't work

st3f10 · January 2, 2024, 3:21pm

Hello,
first of all, many thanks for your quick reply.
I have added the pci=nomsi kernel option to my GRUB configuration following this procedure:

Added pci=nomsi to the file /etc/default/grub:
cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=“$(sed ‘s, release .*$,g’ /etc/system-release)”
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=false
GRUB_TERMINAL_OUTPUT=“console”
GRUB_CMDLINE_LINUX=“rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet pci=nomsi”
GRUB_DISABLE_RECOVERY=“true”
GRUB_THEME=“/boot/grub2/themes/qubes/theme.txt”
GRUB_CMDLINE_XEN_DEFAULT=“console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096”
GRUB_DISABLE_OS_PROBER=“true”
GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX usbcore.authorized_default=0”
Committed the configuration:
sudo grub2-mkconfig -o /boot/efi/EFI/qubes/grub.cfg
Generated the initramfs image:
sudo dracut -f
Rebooted the system.

After the reboot, I verified that the kernel parameter is correctly added:
cat /proc/cmdline
placeholder root=/dev/mapper/qubes_dom0-root ro rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet pci=nomsi usbcore.authorized_default=0

Is this the right procedure?

I apologize for asking, but I’m new to Qubes, and I want to make sure I have followed the correct steps.
If what I have done is correct, I will let you know if the pci=nomsi option works.

Many thanks again for your help.

apparatus · January 2, 2024, 4:00pm

It’s better to add separate GRUB_CMDLINE_LINUX line at the end of /etc/default/grub file instead of modifying the default one like this:

GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX pci=nomsi”

It’s unnecessary to regenerate the initramfs, you need to only generate the new GRUB config.
But overall it’s right.

st3f10 · January 2, 2024, 4:40pm

here is my current configuration (cfg) for GRUB:
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=“$(sed ‘s, release .*$,g’ /etc/system-release)”
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=false
GRUB_TERMINAL_OUTPUT=“console”
GRUB_CMDLINE_LINUX=“rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet”
GRUB_DISABLE_RECOVERY=true
GRUB_THEME=“/boot/grub2/themes/qubes/theme.txt”
GRUB_CMDLINE_XEN_DEFAULT=“console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096”
GRUB_DISABLE_OS_PROBER=true
GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX usbcore.authorized_default=0”
GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX pci=nomsi”

Output of /proc/cmdline:
placeholder root=/dev/mapper/qubes_dom0-root ro rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet usbcore.authorized_default=0 pci=nomsi

But now, with the pci=nomsi option enabled, the journalctl continuously reports the following error:
Jan 02 17:15:42 dom0 kernel: ata1: illegal qc_active transition (00000000->ffffffff)
Deleting the pci=nomsi option and recompiling resolves the error log.
Attached you can find the journalctl.
journalctlgen02_1.log (1.2 MB)

I suspect that disabling Message Signaled Interrupts (MSI) for PCI devices may impact the interrupt handling mechanism and lead to issues with the ATA controller.
What are your thoughts?

Despite this error, Qubes OS seems to be working fine.

Many thanks.

apparatus · January 2, 2024, 5:14pm

It could cause some issues but I don’t have enough knowledge to comment on this.
I’ve offered this option for a test only because Qubes OS developer suggested to try this option in one of the linked github issues. If it’ll work with this option then I suggest to report this on github issue for further tracing of this issue.

You can also try to disable IOMMU for your GPU:

st3f10 · January 2, 2024, 5:26pm

Hi apparatus,
I’ll try to go deeper into this issue, searching and collecting more information.
In the meantime, I appreciate your valuable assistance, and I’ll keep you updated if there are any developments.
By the way, I have another question: when you suggest reporting this issue on GitHub, are you referring to this link: Issues · QubesOS/qubes-issues · GitHub?

Thanks a lot.

apparatus · January 2, 2024, 5:35pm

Yes, you can open a new issue there or maybe it’d be better to report your problem in this existing issue:

github.com/QubesOS/qubes-issues

GUI randomly freezes

opened 09:27AM - 17 Dec 18 UTC

closed 06:10AM - 20 Apr 23 UTC

GammaSQ

T: bug C: other P: default affects-4.1

### Qubes OS version: 4.0  ### Affected component(s): GUI --- ### Steps to reproduce the behavior:  Random. Happened three times by now and every time within ~30 min after boot. Couldn't find any other correlation, apart from the VMs started at that moment, but they are what I always start after bootup. ### Actual behavior: At some point, the GUI freezes but the cursor can be moved around. Nothing reacts, not windows nor xfce-taskbar buttons. Switching to tty2 (Alt+Ctrl+F2) takes very long. When switching back (Alt+Ctrl+F1), again taking very long, it appears the windows actually reacted to what I did before (they are brought to foreground / moved around), but the cursor doesn't move anymore and any keyboard-input is ignored. (i.e. I can't switch back to tty2). ### General notes: I thought this was a performance-issue, so I checked xentop and top in tty2, but there was nothing out of the ordinary. RAM usage was far below maximum every time, CPU usage was in the single-% range. Anything else I could check when it happens again?

And have this issue reopened because your issue seems to be similar.

st3f10 · January 3, 2024, 1:24pm

Hi apparatus,
I’ve commented on the “GUI randomly freezes” issue as you suggested.
Many thanks again for your precious help.

st3f10 · January 4, 2024, 10:10am

Unfortunately, it’s not possible to reopen that case; I cannot use the Qubes issue tracker to ask for support.

I have a small update: if I suddenly disconnect the HDMI port of the second monitor, the dom0 GUI is recovered on the primary monitor.
After the GUI recovery, I can reconnect the second monitor, and I can work while waiting for the next issue
In the journal log, I can see the following messages:
gen 03 18:18:40 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [CRTC:131:pipe B] flip_done timed out
gen 03 18:18:55 dom0 kernel: i915 0000:00:02.0: [drm] ERROR flip_done timed out
gen 03 18:18:55 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [CRTC:131:pipe B] commit wait timed out
gen 03 18:19:05 dom0 kernel: i915 0000:00:02.0: [drm] ERROR flip_done timed out
gen 03 18:19:05 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [PLANE:82:plane 1B] commit wait timed out
gen 03 18:19:07 dom0 systemd[1]: Started getty@tty5.service - Getty on tty5.

In this log, I can see more error messages, but in a nutshell, it seems not helping…
Attached, you can find the complete journal log.
journalctl_03_1.log (682.7 KB)
Reading the “GUI randomly freezes” issue, I have seen:
Looks like a GPU driver problem, may be related to IOMMU. Try adding iommu=no-igfx to the hypervisor command line (options= in /boot/efi/EFI/qubes/xen.cfg).
Moreover, I have found this case in the forum:

So now, I have added the iommu option to /etc/default/grub:
GRUB_CMDLINE_XEN_DEFAULT=“$GRUB_CMDLINE_XEN_DEFAULT iommu=no-igfx”
compile and reboot…

apparatus · January 4, 2024, 11:37am

You can try intremap=off option for a test:

st3f10 · January 11, 2024, 9:15pm

Hi apparatus,
I have an update:
While searching for a solution, I came across this post:
https://gitlab.freedesktop.org/drm/intel/-/issues/8685.
After reading the comments, I noticed that they were experiencing the same problem I’m facing in Qubes 4.2.
Therefore, I attempted to start Qubes with the previous kernel that uses the i915 driver version 2.16:

uname -r
6.1.62-1.qubes.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov 14 06:16:38 GMT 2023 x86_64 x86_64 x86_64 GNU/Linux
cat /sys/kernel/debug/dri/0/i915_dmc_info
fw loaded: yes
path: i915/adlp_dmc_ver2_16.bin
Pipe A fw support: yes
Pipe A fw loaded: yes
Pipe B fw support: yes
Pipe B fw loaded: yes
version: 2.16
DC3CO count: 0
DC3 → DC5 count: 5
DC5 → DC6 count: 0
program base: 0x0c0a4040
ssp base: 0x00086fc0
htp: 0x01240108

Since I’m using kernel 6.1.62-1.qubes.fc37.x86_64 and i915 driver version 2.16, the issue no longer occurs.
Now, I don’t have enough knowledge to troubleshoot this issue;
it’s not clear to me if this problem can be seen as a Qubes bug or a problem related to the i915 drivers.
What are your thoughts?

Many thanks.

apparatus · January 11, 2024, 11:50pm

You can try kernel 6.6.9, maybe the fix is in there.
Install kernel-latest package in dom0 from current-testing repository:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing kernel-latest

st3f10 · January 13, 2024, 11:04am

Hi apparatus,
I’m testing Qubes 4.2 with kernel 6.6.9 and i915 driver version 2.20:
[root@dom0 st3f10]# uname -mrs
Linux 6.6.9-1.qubes.fc37.x86_64 x86_64
[root@dom0 st3f10]# cat /proc/cmdline
placeholder root=/dev/mapper/qubes_dom0-root ro rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet usbcore.authorized_default=0
[root@dom0 st3f10]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=“$(sed ‘s, release .*$,g’ /etc/system-release)”
GRUB_DEFAULT=saved
GRUB_SAVEDEFAULT=true
GRUB_DISABLE_SUBMENU=false
GRUB_TERMINAL_OUTPUT=“console”
GRUB_CMDLINE_LINUX=“rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet”
GRUB_DISABLE_RECOVERY=“true”
GRUB_THEME=“/boot/grub2/themes/qubes/theme.txt”
GRUB_CMDLINE_XEN_DEFAULT=“console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096”
GRUB_DISABLE_OS_PROBER=“true”
GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX usbcore.authorized_default=0”
[root@dom0 st3f10]# cat /sys/kernel/debug/dri/0/i915_dmc_info
DMC initialized: yes
fw loaded: yes
path: i915/adlp_dmc.bin
Pipe A fw needed: yes
Pipe A fw loaded: yes
Pipe B fw needed: yes
Pipe B fw loaded: yes
version: 2.20
DC3CO count: 0
DC3 → DC5 count: 0
DC5 → DC6 count: 0
program base: 0x0c0a4040
ssp base: 0x00086fc0
htp: 0x01240108

I have noticed another issue that could be related to the i915 driver version 2.20:
the suspend service doesn’t work reporting this error in the journalctl log:
gen 13 11:50:23 dom0 kernel: intel_pmc_core INT33A1:00: PM: dpm_run_callback(): acpi_subsys_suspend_late+0x0/0x50 returns -5
gen 13 11:50:23 dom0 kernel: intel_pmc_core INT33A1:00: PM: failed to suspend late: error -5
gen 13 11:50:23 dom0 kernel: PM: late suspend of devices failed
attached you can find the journalctl log
lastboot-journallog.log (342.4 KB)

just to summarize:
dom0 kernel 6.1.62-1 with i915 driver 2.16: no issue, suspend works
dom0 kernel 6.6.2-1 with i915 driver 2.20: flip_done timeout issue and suspend doesn’t work
dom0 kernel 6.6.9-1 with i915 driver 2.20: under test, suspend doesn’t work

Many thanks

apparatus · January 13, 2024, 11:19am

Looks like S3 suspend is not supported?

st3f10 · January 13, 2024, 11:37am

Yes, maybe… but in this case, I cannot understand why the suspend function works with kernel 6.1.62-1 and i915 driver 2.16. Am I missing something?

apparatus · January 13, 2024, 11:58am

Seems to be some issue in later kernel versions after 6.2:

st3f10 · January 20, 2024, 10:34am

After one week, I can provide an update:

dom0 kernel 6.1.62-1 with i915 driver 2.16:

flip_done timeout issue: NOT PRESENT
suspend WORKS

dom0 kernel 6.6.2-1 with i915 driver 2.20:

flip_done timeout issue: >PRESENT
suspend DOESN’T WORK

dom0 kernel 6.6.9-1 with i915 driver 2.20:

flip_done timeout issue: NOT PRESENT
suspend DOESN’T WORK

At this stage, it seems that these are two different issues. What I mean is that:

the ‘flip_done’ issue is not present with kernel versions 6.1.62-1 and 6.6.9-1, so it appears to be a problem associated with version 6.2.
the suspend issue seems to be related to the i915 driver version; it is not present with i915 2.16.
Do you agree?

Another thing I would like to test is kernel version 6.6.9-1 with i915 2.16, but I don’t know how to downgrade the i915 drivers in kernel 6.6.9-1. I’m quite sure I need to recompile the kernel, but I do not have enough knowledge to do this.

apparatus · January 21, 2024, 6:44am

Yes, there seems to be a separate issue with suspend.
Did you check that S3 is enabled in dom0?

cat /sys/power/mem_sleep

You can also try to update your BIOS.

I didn’t find any related issue on github so you can open a new one regarding this regression with S3 suspend in newer kernel versions.

st3f10 · January 21, 2024, 11:08am

Actual settings and mem_sleep state:
[st3f10@dom0 ~]$ sudo dmesg | grep ACPI | grep supports
[1.893305] ACPI: PM: (supports S0 S5)
[st3f10@dom0 ~]$ cat /sys/power/mem_sleep
[s2idle]

This appears to indicate that my Lenovo doesn’t support the Suspend-to-RAM (S3) state, correct?
I have checked the BIOS, and it is updated to the latest version.
Additionally, I cannot find any BIOS option to enable the Suspend-to-ram S3 state.

As you can see in both cases, the system log reports:
PM: suspend entry (s2idle)
This should indicate that it is using state S0 and not S3.
According to kernel docs, s2idle maps to ACPI state S0:

"State: Suspend-To-Idle
ACPI state: S0
Label: “s2idle” (“freeze”)

This state is a generic, pure software, lightweight, system sleep state.
It allows more energy to be saved relative to runtime idle by freezing user space and putting
all I/O devices into low-power states (possibly lower-power than available at runtime), allowing
processors to spend more time in their idle states."

So in this case, I could open an issue to report that Suspend-to-idle is not working with kernel 6.6.9,
but I cannot ask for S3 because it is not supported by my Lenovo.
Have I understood this correctly, or am I missing something?

Just for information,
attached you can find the logs reported with kernel 6.1.62 and 6.6.9.
suspend_output_6.1.62.log (3.9 KB)
suspend_output_6.6.9.log (5.5 KB)
With kernel 6.6.9, the log reports this error:
Jan 21 10:04:42 dom0 kernel: intel_pmc_core INT33A1:00: PM: dpm_run_callback(): acpi_subsys_suspend_late+0x0/0x50 returns -5
Jan 21 10:04:42 dom0 kernel: intel_pmc_core INT33A1:00: PM: failed to suspend late: error -5