Dom0 gui gets stuck with second monitor connected

Try to add the pci=nomsi kernel option in dom0 for a test.
Maybe related issues:

1 Like

Hello,
first of all, many thanks for your quick reply.
I have added the pci=nomsi kernel option to my GRUB configuration following this procedure:

  1. Added pci=nomsi to the file /etc/default/grub:
    cat /etc/default/grub
    GRUB_TIMEOUT=5
    GRUB_DISTRIBUTOR=“$(sed ‘s, release .*$,g’ /etc/system-release)”
    GRUB_DEFAULT=saved
    GRUB_DISABLE_SUBMENU=false
    GRUB_TERMINAL_OUTPUT=“console”
    GRUB_CMDLINE_LINUX=“rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet pci=nomsi”
    GRUB_DISABLE_RECOVERY=“true”
    GRUB_THEME=“/boot/grub2/themes/qubes/theme.txt”
    GRUB_CMDLINE_XEN_DEFAULT=“console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096”
    GRUB_DISABLE_OS_PROBER=“true”
    GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX usbcore.authorized_default=0”

  2. Committed the configuration:
    sudo grub2-mkconfig -o /boot/efi/EFI/qubes/grub.cfg

  3. Generated the initramfs image:
    sudo dracut -f

  4. Rebooted the system.

After the reboot, I verified that the kernel parameter is correctly added:
cat /proc/cmdline
placeholder root=/dev/mapper/qubes_dom0-root ro rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet pci=nomsi usbcore.authorized_default=0

Is this the right procedure?

I apologize for asking, but I’m new to Qubes, and I want to make sure I have followed the correct steps.
If what I have done is correct, I will let you know if the pci=nomsi option works.

Many thanks again for your help.

It’s better to add separate GRUB_CMDLINE_LINUX line at the end of /etc/default/grub file instead of modifying the default one like this:

GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX pci=nomsi”

It’s unnecessary to regenerate the initramfs, you need to only generate the new GRUB config.
But overall it’s right.

here is my current configuration (cfg) for GRUB:
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=“$(sed ‘s, release .*$,g’ /etc/system-release)”
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=false
GRUB_TERMINAL_OUTPUT=“console”
GRUB_CMDLINE_LINUX=“rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet”
GRUB_DISABLE_RECOVERY=true
GRUB_THEME=“/boot/grub2/themes/qubes/theme.txt”
GRUB_CMDLINE_XEN_DEFAULT=“console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096”
GRUB_DISABLE_OS_PROBER=true
GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX usbcore.authorized_default=0”
GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX pci=nomsi”

Output of /proc/cmdline:
placeholder root=/dev/mapper/qubes_dom0-root ro rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet usbcore.authorized_default=0 pci=nomsi

But now, with the pci=nomsi option enabled, the journalctl continuously reports the following error:
Jan 02 17:15:42 dom0 kernel: ata1: illegal qc_active transition (00000000->ffffffff)
Deleting the pci=nomsi option and recompiling resolves the error log.
Attached you can find the journalctl.
journalctlgen02_1.log (1.2 MB)

I suspect that disabling Message Signaled Interrupts (MSI) for PCI devices may impact the interrupt handling mechanism and lead to issues with the ATA controller.
What are your thoughts?

Despite this error, Qubes OS seems to be working fine.

Many thanks.

It could cause some issues but I don’t have enough knowledge to comment on this.
I’ve offered this option for a test only because Qubes OS developer suggested to try this option in one of the linked github issues. If it’ll work with this option then I suggest to report this on github issue for further tracing of this issue.

You can also try to disable IOMMU for your GPU:

Hi apparatus,
I’ll try to go deeper into this issue, searching and collecting more information.
In the meantime, I appreciate your valuable assistance, and I’ll keep you updated if there are any developments.
By the way, I have another question: when you suggest reporting this issue on GitHub, are you referring to this link: Issues · QubesOS/qubes-issues · GitHub?

Thanks a lot.

Yes, you can open a new issue there or maybe it’d be better to report your problem in this existing issue:

And have this issue reopened because your issue seems to be similar.

1 Like

Hi apparatus,
I’ve commented on the “GUI randomly freezes” issue as you suggested.
Many thanks again for your precious help.

Unfortunately, it’s not possible to reopen that case; I cannot use the Qubes issue tracker to ask for support.

I have a small update: if I suddenly disconnect the HDMI port of the second monitor, the dom0 GUI is recovered on the primary monitor.
After the GUI recovery, I can reconnect the second monitor, and I can work while waiting for the next issue :slight_smile:
In the journal log, I can see the following messages:
gen 03 18:18:40 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [CRTC:131:pipe B] flip_done timed out
gen 03 18:18:55 dom0 kernel: i915 0000:00:02.0: [drm] ERROR flip_done timed out
gen 03 18:18:55 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [CRTC:131:pipe B] commit wait timed out
gen 03 18:19:05 dom0 kernel: i915 0000:00:02.0: [drm] ERROR flip_done timed out
gen 03 18:19:05 dom0 kernel: i915 0000:00:02.0: [drm] ERROR [PLANE:82:plane 1B] commit wait timed out
gen 03 18:19:07 dom0 systemd[1]: Started getty@tty5.service - Getty on tty5.

In this log, I can see more error messages, but in a nutshell, it seems not helping…
Attached, you can find the complete journal log.
journalctl_03_1.log (682.7 KB)
Reading the “GUI randomly freezes” issue, I have seen:
Looks like a GPU driver problem, may be related to IOMMU. Try adding iommu=no-igfx to the hypervisor command line (options= in /boot/efi/EFI/qubes/xen.cfg).
Moreover, I have found this case in the forum:

So now, I have added the iommu option to /etc/default/grub:
GRUB_CMDLINE_XEN_DEFAULT=“$GRUB_CMDLINE_XEN_DEFAULT iommu=no-igfx”
compile and reboot…

You can try intremap=off option for a test:

Hi apparatus,
I have an update:
While searching for a solution, I came across this post:
https://gitlab.freedesktop.org/drm/intel/-/issues/8685.
After reading the comments, I noticed that they were experiencing the same problem I’m facing in Qubes 4.2.
Therefore, I attempted to start Qubes with the previous kernel that uses the i915 driver version 2.16:

uname -r
6.1.62-1.qubes.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Nov 14 06:16:38 GMT 2023 x86_64 x86_64 x86_64 GNU/Linux
cat /sys/kernel/debug/dri/0/i915_dmc_info
fw loaded: yes
path: i915/adlp_dmc_ver2_16.bin
Pipe A fw support: yes
Pipe A fw loaded: yes
Pipe B fw support: yes
Pipe B fw loaded: yes
version: 2.16
DC3CO count: 0
DC3 → DC5 count: 5
DC5 → DC6 count: 0
program base: 0x0c0a4040
ssp base: 0x00086fc0
htp: 0x01240108

Since I’m using kernel 6.1.62-1.qubes.fc37.x86_64 and i915 driver version 2.16, the issue no longer occurs.
Now, I don’t have enough knowledge to troubleshoot this issue;
it’s not clear to me if this problem can be seen as a Qubes bug or a problem related to the i915 drivers.
What are your thoughts?

Many thanks.

You can try kernel 6.6.9, maybe the fix is in there.
Install kernel-latest package in dom0 from current-testing repository:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing kernel-latest

Hi apparatus,
I’m testing Qubes 4.2 with kernel 6.6.9 and i915 driver version 2.20:
[root@dom0 st3f10]# uname -mrs
Linux 6.6.9-1.qubes.fc37.x86_64 x86_64
[root@dom0 st3f10]# cat /proc/cmdline
placeholder root=/dev/mapper/qubes_dom0-root ro rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet usbcore.authorized_default=0
[root@dom0 st3f10]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=“$(sed ‘s, release .*$,g’ /etc/system-release)”
GRUB_DEFAULT=saved
GRUB_SAVEDEFAULT=true
GRUB_DISABLE_SUBMENU=false
GRUB_TERMINAL_OUTPUT=“console”
GRUB_CMDLINE_LINUX=“rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap plymouth.ignore-serial-consoles 6.6.2-1.qubes.fc37.x86_64 x86_64 rhgb quiet”
GRUB_DISABLE_RECOVERY=“true”
GRUB_THEME=“/boot/grub2/themes/qubes/theme.txt”
GRUB_CMDLINE_XEN_DEFAULT=“console=none dom0_mem=min:1024M dom0_mem=max:4096M ucode=scan smt=off gnttab_max_frames=2048 gnttab_max_maptrack_frames=4096”
GRUB_DISABLE_OS_PROBER=“true”
GRUB_CMDLINE_LINUX=“$GRUB_CMDLINE_LINUX usbcore.authorized_default=0”
[root@dom0 st3f10]# cat /sys/kernel/debug/dri/0/i915_dmc_info
DMC initialized: yes
fw loaded: yes
path: i915/adlp_dmc.bin
Pipe A fw needed: yes
Pipe A fw loaded: yes
Pipe B fw needed: yes
Pipe B fw loaded: yes
version: 2.20
DC3CO count: 0
DC3 → DC5 count: 0
DC5 → DC6 count: 0
program base: 0x0c0a4040
ssp base: 0x00086fc0
htp: 0x01240108

I have noticed another issue that could be related to the i915 driver version 2.20:
the suspend service doesn’t work reporting this error in the journalctl log:
gen 13 11:50:23 dom0 kernel: intel_pmc_core INT33A1:00: PM: dpm_run_callback(): acpi_subsys_suspend_late+0x0/0x50 returns -5
gen 13 11:50:23 dom0 kernel: intel_pmc_core INT33A1:00: PM: failed to suspend late: error -5
gen 13 11:50:23 dom0 kernel: PM: late suspend of devices failed
attached you can find the journalctl log
lastboot-journallog.log (342.4 KB)

just to summarize:
dom0 kernel 6.1.62-1 with i915 driver 2.16: no issue, suspend works
dom0 kernel 6.6.2-1 with i915 driver 2.20: flip_done timeout issue and suspend doesn’t work
dom0 kernel 6.6.9-1 with i915 driver 2.20: under test, suspend doesn’t work

Many thanks

Looks like S3 suspend is not supported?

Yes, maybe… but in this case, I cannot understand why the suspend function works with kernel 6.1.62-1 and i915 driver 2.16. Am I missing something?

Seems to be some issue in later kernel versions after 6.2:

After one week, I can provide an update:

dom0 kernel 6.1.62-1 with i915 driver 2.16:

  • flip_done timeout issue: NOT PRESENT
  • suspend WORKS

dom0 kernel 6.6.2-1 with i915 driver 2.20:

  • flip_done timeout issue: >PRESENT
  • suspend DOESN’T WORK

dom0 kernel 6.6.9-1 with i915 driver 2.20:

  • flip_done timeout issue: NOT PRESENT
  • suspend DOESN’T WORK

At this stage, it seems that these are two different issues. What I mean is that:

  • the ‘flip_done’ issue is not present with kernel versions 6.1.62-1 and 6.6.9-1, so it appears to be a problem associated with version 6.2.
  • the suspend issue seems to be related to the i915 driver version; it is not present with i915 2.16.
    Do you agree?

Another thing I would like to test is kernel version 6.6.9-1 with i915 2.16, but I don’t know how to downgrade the i915 drivers in kernel 6.6.9-1. I’m quite sure I need to recompile the kernel, but I do not have enough knowledge to do this.

Yes, there seems to be a separate issue with suspend.
Did you check that S3 is enabled in dom0?

cat /sys/power/mem_sleep

You can also try to update your BIOS.

I didn’t find any related issue on github so you can open a new one regarding this regression with S3 suspend in newer kernel versions.

Actual settings and mem_sleep state:
[st3f10@dom0 ~]$ sudo dmesg | grep ACPI | grep supports
[1.893305] ACPI: PM: (supports S0 S5)
[st3f10@dom0 ~]$ cat /sys/power/mem_sleep
[s2idle]

This appears to indicate that my Lenovo doesn’t support the Suspend-to-RAM (S3) state, correct?
I have checked the BIOS, and it is updated to the latest version.
Additionally, I cannot find any BIOS option to enable the Suspend-to-ram S3 state.

As you can see in both cases, the system log reports:
PM: suspend entry (s2idle)
This should indicate that it is using state S0 and not S3.
According to kernel docs, s2idle maps to ACPI state S0:

"State: Suspend-To-Idle
ACPI state: S0
Label: “s2idle” (“freeze”)

This state is a generic, pure software, lightweight, system sleep state.
It allows more energy to be saved relative to runtime idle by freezing user space and putting
all I/O devices into low-power states (possibly lower-power than available at runtime), allowing
processors to spend more time in their idle states."

So in this case, I could open an issue to report that Suspend-to-idle is not working with kernel 6.6.9,
but I cannot ask for S3 because it is not supported by my Lenovo.
Have I understood this correctly, or am I missing something?

Just for information,
attached you can find the logs reported with kernel 6.1.62 and 6.6.9.
suspend_output_6.1.62.log (3.9 KB)
suspend_output_6.6.9.log (5.5 KB)
With kernel 6.6.9, the log reports this error:
Jan 21 10:04:42 dom0 kernel: intel_pmc_core INT33A1:00: PM: dpm_run_callback(): acpi_subsys_suspend_late+0x0/0x50 returns -5
Jan 21 10:04:42 dom0 kernel: intel_pmc_core INT33A1:00: PM: failed to suspend late: error -5

I’ve thought that you had S3 since some modern Lenovo laptops seems to still support it.
Since S0 suspend worked for you in older kernels then you can report your hardware and the last working kernel with logs here:

Maybe it’ll be of help.