AMD iGPU issue after recent update

After a seemingly random amount of time, my display will cut out and typically not recover. The cutout doesn’t seem to correlate to any particular activity or lack of activity. Audio is unaffected which is essentially the only way I’m able to differentiate this from a system crash.

I recently updated dom0 and switched to fedora 42 templates. I assume the template switch is unrelated and that the dom0 update is the cause.

I’m using an AMD processor with the iGPU being used for output. Kernel version 6.12.59-1.

I’ve tried 2 older kernel versions available in boot options, and I tried setting amdgpu.dpm=0, but none of that fixed the issue.

Here’s what appears at the start of the issue in journalctl. This is followed by things like the display manager crashing.

Dec 19 21:12:45 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:12:45 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable gfxoff!
Dec 19 21:12:49 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:12:49 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable gfxoff!
Dec 19 21:12:50 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Dumping IP State
Dec 19 21:12:55 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:12:55 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable gfxoff!
Dec 19 21:13:00 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:13:00 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable gfxoff!
Dec 19 21:13:04 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:13:04 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable gfxoff!
Dec 19 21:13:09 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:13:09 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable gfxoff!
Dec 19 21:13:09 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Dumping IP State Completed
Dec 19 21:13:09 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=236853, emitted seq=236855
Dec 19 21:13:09 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Process information: process Xorg pid 11397 thread X:cs0 pid 11404
Dec 19 21:13:09 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU reset begin!
Dec 19 21:13:13 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:13:13 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable gfxoff!
Dec 19 21:13:18 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:13:18 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to disable smu features.
Dec 19 21:13:18 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: MODE2 reset
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000017 SMN_C2PMSG_82:0x00000000
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Failed to mode reset!
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Mode2 reset failed!
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU mode2 reset failed
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:1b:00.0
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU reset(1) failed
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU reset end with ret = -62
Dec 19 21:13:23 dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU Recovery Failed: -62
Dec 19 21:13:23 dom0 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
1 Like

I was having similar issues with my amdgpu, upgrading to latest kernel (6.17.9-1) solved it for me.

sudo qubes-dom0-update kernel-latest kernel-latest-qubes-vm

1 Like

I had a somewhat similar issue. After updating dom0, the screen would remain blank after rebooting. But, choosing an older kernel from the grub menu fixed it. So, the opposite from your situation @renehoj

1 Like

Unfortunately this didn’t fix the issue.

This time it did recover, which has happened a few times in the past, and I got some different logs. I’m not sure if that’s related to the new kernel version. (Just had one that did not recover with the same logs as originally)

dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Dumping IP State
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Dumping IP State Completed
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=98310, emitted seq=98312
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu:  Process xfwm4 pid 7484 thread xfwm4:cs0 pid 7525
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Starting gfx_0.0.0 ring reset
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: Ring gfx_0.0.0 reset failed
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU reset begin!
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: MODE2 reset
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU reset succeeded, trying to resume
dom0 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: PSP is resuming...
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: RAS: optional ras ta ucode is not available
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: RAP: optional rap ta ucode is not available
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU is resuming...
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: SMU is resumed successfully!
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: kiq ring mec 2 pipe 1 q 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x05002C00
dom0 kernel: amdgpu 0000:1b:00.0: [drm] *ERROR* LTTPR count is nonzero but invalid lane count reported. Assuming no LTTPR present.
dom0 kernel: amdgpu 0000:1b:00.0: [drm] *ERROR* LTTPR count is nonzero but invalid lane count reported. Assuming no LTTPR present.
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
dom0 kernel: amdgpu 0000:1b:00.0: amdgpu: GPU reset(1) succeeded!
dom0 kernel: amdgpu 0000:1b:00.0: [drm] device wedged, but recovered through reset
1 Like

I’m using an 9950X CPU.

This seems to be the issue I was having, and it’s seem to have to do with the microcode.
https://gitlab.freedesktop.org/drm/amd/-/issues/4737
https://bbs.archlinux.org/viewtopic.php?pid=2275227

On my system I noticed two issues, xorg would crash and mouse cursor speed was slightly slower then normal. After I updated the kernel I’ve not has any crashes, and the cursor speed is back to normal.

I’m using the 2025-10-20 microcode from the amd ucode package, which might be why it’s working.

If you are using a Zen 5 AMD CPU, you need to change ucode=scan to ucode=scan,digest-check=off in grub conf to be able to use the ucode update.

1 Like

I am using a Zen 4 CPU. How do you downgrade the amd-ucode-firmware to 2025-10-20 in dom0?

1 Like

I don’t think Zen 4 and 5 are using the same version numbers and release dates, you can use xl dmesg to see if Xen if loading microcode from amd-ucode-firmware.

If the microcode gets updated there will be a lines similar to this
(XEN) microcode: CPU0 updated from revision 0xb404032 to 0xb404035, date = 2025-10-20

The amd-ucode-firmware is only used if it’s newer then the UEFI/BIOS firmware microcode.

I’m not seeing anything similar to this so it doesn’t appear to be.

1 Like

What exact hardware did/do you try this on? One of my “testing machines” is a 8945hs with 780m gpu and I could install things just fine, but it wouldn’t start up in the end after installation is finished. It crashes right after entering the LUKS password screen during init. It’s not just Xorg, I guess, since it’s impossible to get to some tty and the machine won’t reply to pings. Tried with different shared RAM settings and different kernel parameters under 4.2.4 and 4.3-rc[3|4]. To no avail so far.

1 Like

@OvalZero I had the exact same problem, fortunatly I also have a discrete card so I removed “Hybrid graphic” from the bios so graphic where exclusively handled by Nvidia discrete card.

Couples of days after the installation, I decided to give it a try with “hybrid graphic” again and it worked. It did not reboot after LUKS password like it did during the installation.

Is it because the crash/reboot was specifically caused by something going on right after the installation or is it because Dom0 has now been updated, can’t tell.

Hope you have a discrete card as well to try this workaround.

I just had xorg crash, updating the kernel and/or micrcode didn’t fix the problem on my system.

1 Like

AMD Ryzen 9 7900X using integrated graphics.

I have a dedicated GPU connected, but it’s for passthrough not for the main/dom0 display. When a qube is using the dedicated GPU for display output, that qube remains unaffected. I haven’t tried using the dedicated GPU for dom0’s display.

This sounds like a different issue to what I’m experiencing. I can reliably boot to desktop and continue for sometimes hours without an issue. Then, seemingly randomly, the display except the cursor will freeze, shortly followed by the cursor, shortly followed by a black screen which occasionally recovers. When there is a recovery, often running qubes will not show their windows and the entire qube has to be restarted. Logs indicate that this all starts because of some issue with the iGPU, and other parts of the system appear to be unaffected. I haven’t tested pinging the system from an external host, but as I’ve said if I have a qube using the dedicated GPU when this happens, that qube remains entirely unaffected.

That’s unfortunate, if it is the issue you linked earlier, then we may be stuck until qubes releases the new version of linux-firmware.

2 Likes

This particular issue was discussed on the Qubes Matrix chat.

The boot-time black screen on kernel-latest is the result of AMDs XDNA driver misbehaving. You can fix it by adding a kernel boot option module_blacklist=amdxdna

2 Likes

also affected by the issue as described by original poster, with same symptoms, random crash of xorg then back to login screen with amdgpu ring timeout error, some qubes need to be restarted because they are unresponsive on relogin

I have AMD 7800X3D CPU with integrated graphics

I’m on 6.12 kernel, will be trying to solve the issue using suggestions here if possible (not sure if a definite workaround was posted or not yet)

1 Like

One more report of the same problem (AMD 7900X with iGPU). With kernel 6.17 the crashes happen very quickly, with 6.15 I can get something done before dom0 crashes (it sometimes recovers, it sometimes does not).

Looking at one of the links shared above, the culprit seems to be an updated amdgpu firmware rather than the cpu microcode.

Does anyone know how to downgrade the gpu firmware? Dnf downgrade tells me the package is already the lowest version.

By the way, I upgraded to 4.3.0 and the problem is still there.

I found out how to downgrade the GPU firmware, and a quick test with version 20251011-1.fc41 seems to be working. Newer versions lead to GPU lockups very quickly as soon as a couple of AppVM windows are open.
Downgrading the firmware turned out to be simple. You need to run in dom0 the following command:

sudo qubes-dom0-update --action=downgrade amd-gpu-firmware-1:20251011-1.fc41

I spoke too soon, the GPU crashed again. I downgraded further to 20250917, and after 37 minutes no crashes yet. I’ll keep working and update what on what happens.

[Edited to add the results (as somehow I cannot post more than 3 replies in a row)]

After testing several firmware and kernel combinations, I found the following:

  • With kernels >= 6.15 (the ones in kernel-latest) and all the firmware versions I tested, the GPU crashes quickly and often (sometimes without recovering), making the system unusable.
  • With 6.12 kernels in the regular kernel package, crashes are much less frequent (maybe after a couple of hours) with firmwares newer than amd-gpu-firmware-1:20250917-1.fc41

The 6.12 kernel and amd-gpu-firmware-1:20250917-1.fc41 seem to work without crashes (or maybe I just got lucky…), so I will probably stay with that combination for now.

I have an AMD 7900X that has a Raphael (rev c2) iGPU.

So that’s it for now, please post if you find something different.

1 Like

I haven’t noticed any difference in stability between different kernel versions, but I haven’t downgraded amd-gpu-firmware. Sometimes things run fine for many hours, sometimes I experience issues minutes into the boot.

linux-firmware 20260110-1 seems to be available on qubes-dom0-current-testing