I need a hand. I’ve got gpu passthough working in a different qube based on fedora-42 template, verified by running ollama and radeontop, but for some reason my Bazzite standalone qube refuses to use use the gpu despite detecting it. I’m not running the two qubes at the same time. I’ve tried rebooting then opening Bazzite in case the llm qube didn’t release it properly. lspci detects the card, but no software will actually start utilizing it. The qube is HVM with UEFI Any help would be appreciated.
Hi,
I believe this is because fedora/bazzite dropped x11 (you need this).
I tried bazzite as well as a bunch of others, my gaming qube is cachyOS with x11, it works great.
wayland support for qubes is a work in progress
Also to get it to use the gpu install the ‘prime-run’ package, and then run your program, eg ‘prime-run steam’
I forgot to mention it’s an AMD gpu that changes things slightly as if I remember right prime-run is for NVIDIA.
If Bazzite is only using llvmpipe even though it sees your GPU, the emulated virtual adapter is probably enumerated as /dev/dri/card0 and your passthrough GPU is /dev/dri/card1.
I was able to get Bazzite to use my passthough GPU by flipping the order in the KWIN_DRM_DEVICES variable.
create /etc/profile.d/passthrough-gpu.sh
export KWIN_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0
then make it executable with
sudo chmod +x /etc/profile.d/passthrough-gpu.sh
Log out of your KDE session and back in again. I’m not sure what the equivalent would be for Gnome.
Tried cachyOS and couldn’t get steam to boot at all so I went back to Bazzite because I can get Barony playable even without GPU.
tried the KWIN_DRM_DEVICES and it caused the glxinfo to show my passthrough gpu as the OpenGL Renderer, but the desktop became a slide show. ran mangohud Barony (a small linux native game) and I got less FPS then when it was running on pure cpu showing 0~1% gpu usage.
It looks to pretty consitently enumerate the amd card as card0.
If I use gamescope --mangoapp – glxgears I’m getting 10 FPS, but it says it’s selected the amd card (7900 XTX)
Passing though my card for compute was way easier lol
AMD RX 7900 XTX (Navi 31) GPU passthrough: SMU enforces GFXOFF regardless of driver/hypervisor configuration
Summary
GPU passthrough of an AMD RX 7900 XTX to a Bazzite HVM qube completes successfully at the driver and firmware level — amdgpu loads, ATOM BIOS fetched, SMU reports successful initialization, Vulkan device selection works. However, the GPU’s shader clock domain (sclk) remains parked at 0 MHz regardless of workload or DPM force requests, producing roughly 2 FPS on basic gamescope -- glxgears tests. Memory clock (mclk) ramps normally; only the shader domain is affected.
After exhausting kernel-side, driver-side, and Xen-side mitigations, the evidence converges on the SMU firmware enforcing GFXOFF as mandatory and refusing to honor driver requests to disengage it — likely due to incomplete PSP authentication and restricted extended-configuration-space access in the Xen passthrough environment.
Posting in the hope that someone has cracked this on RDNA3, or to add concrete documentation to the body of community knowledge if not.
Hardware
- CPU: AMD, 16 threads, ~64GB system RAM
- GPU: PowerColor RX 7900 XTX (Navi 31)
- PCI ID:
1002:744C, subsystem148C:2422 - Dual-BIOS (both BIOSes tested, identical behavior)
- Host BDF:
0000:0e:00.0(display) and0000:0e:00.1(audio)
- PCI ID:
- Storage: Dual NVMe drives
The GPU sits four hops down from root complex per qvm-pci list:
dom0:00_03.1-00_00.0-00_00.0-00_00.0 Display
dom0:00_03.1-00_00.0-00_00.0-00_00.1 Audio
Three PCIe bridges between root complex and endpoint.
Software
- Host: Qubes OS 4.x with current updates
- Guest qube:
gameqube, HVM mode, OVMF/UEFI, stubdom-linux emulator- 16GB RAM, 6 vCPUs
- Bazzite (rpm-ostree-based Fedora) as guest OS
- GPU driver: in-tree amdgpu
Dom0 configuration
/etc/default/grub:
GRUB_CMDLINE_LINUX="... rd.driver.pre=xen-pciback xen-pciback.hide=(0000:0e:00.0)(0000:0e:00.1) xen-pciback.passthrough=1 xen-pciback.permissive=1"
Global pciback state confirmed:
$ cat /sys/module/xen_pciback/parameters/permissive
Y
$ cat /sys/module/xen_pciback/parameters/passthrough
Y
Per-device attachment with both flags:
$ qvm-pci list gameqube
BACKEND:DEVID DESCRIPTION USED BY
dom0:00_03.1-00_00.0-00_00.0-00_00.0 Navi 31 Display gameqube (attached: permissive=true, no-strict-reset=true)
dom0:00_03.1-00_00.0-00_00.0-00_00.1 Navi 31 HDMI/DP Audio gameqube (attached: permissive=true, no-strict-reset=true)
Libvirt XML confirms writeFiltering='no' and nostrictreset='yes' on both <hostdev> entries.
Symptoms
Inside the qube:
lspcishows the GPU;amdgpuis the kernel driver in use- Card binds, firmware loads, displays are detected, Vulkan device selection works
- VRAM allocation static at ~108 MB regardless of workload
gamescope -- glxgearsproduces 2-16 FPS with mangohud reporting GPU 0%, CPU 99%- Same behavior under Wayland (Bazzite KDE) and X11 wrappers
MESA_VK_DEVICE_SELECTconfirms RADV is selected, not lavapipe
Diagnostic output
amdgpu initialization (dmesg, abbreviated)
amdgpu 0000:00:07.0: amdgpu: initializing kernel modesetting
(IP DISCOVERY 0x1002:0x744C 0x148C:0x2422 0xC8)
amdgpu 0000:00:07.0: amdgpu: detected ip blocks: psp_v13_0, smu_v13_0_0,
gfx_v11_0, sdma_v6_0, vcn_v4_0, jpeg_v4_0, mes_v11_0 (and others)
amdgpu 0000:00:07.0: amdgpu: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000
amdgpu 0000:00:07.0: amdgpu: Fetched VBIOS from ROM
amdgpu: ATOM BIOS: 113-EXT80009-100
amdgpu 0000:00:07.0: amdgpu: System can't access extended configuration space, please check!!
amdgpu 0000:00:07.0: amdgpu: VRAM: 24560M ... GART: 512M
amdgpu 0000:00:07.0: amdgpu: smu driver if version = 0x0000003d,
smu fw if version = 0x00000040, smu fw version = 0x004e8300 (78.131.0)
amdgpu 0000:00:07.0: amdgpu: SMU driver if version not matched
amdgpu 0000:00:07.0: amdgpu: SMU is initialized successfully!
amdgpu 0000:00:07.0: amdgpu: [drm] Display Core v3.2.359 initialized on DCN 3.2
amdgpu 0000:00:07.0: amdgpu: Runtime PM not available
Three notable messages:
- “System can’t access extended configuration space” — PCIe ECAM access restricted by the bridge chain
- “SMU driver if version not matched” (warning, not fatal — SMU still initializes)
- “Runtime PM not available” — kernel cannot fully manage GPU power state transitions
Clock domain behavior
Forcing power_dpm_force_performance_level=high ramps memory clock correctly but shader clock refuses:
$ echo high | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz
1: 0Mhz * <- active level, stuck
2: 2526Mhz
$ cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 96Mhz
1: 456Mhz
2: 772Mhz
3: 1249Mhz * <- memory clock honored
Manual DPM forcing also refused:
$ echo manual | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
$ echo 2 | sudo tee /sys/class/drm/card0/device/pp_dpm_sclk
$ echo 3 | sudo tee /sys/class/drm/card0/device/pp_dpm_mclk
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz
1: 0Mhz * <- still parked despite explicit level-2 request
2: 2526Mhz
The 0 MHz reading is consistent with GFXOFF holding the shader engines in a power-gated state.
pp_features mask, before and after kernel parameter
After applying amdgpu.ppfeaturemask=0xfff7ffff (intent: clear GFXOFF bit 19):
$ cat /proc/cmdline | tr ' ' '\n' | grep amdgpu
amdgpu.ppfeaturemask=0xfff7ffff
$ cat /sys/class/drm/card0/device/pp_features
features high: 0x0003ebb8 low: 0x71ffffff
...
17. GFX_ULV (17) : enabled
19. GFXOFF (19) : enabled <- firmware kept GFXOFF set despite kernel request
20. BACO (20) : enabled
The firmware returns 0x71ffffff regardless of the requested mask, indicating bit 19 (GFXOFF) is firmware-locked and cannot be cleared from the driver side.
GFXOFF debugfs interface
The debugfs entries exist but return errors on read:
$ ls /sys/kernel/debug/dri/0/ | grep gfxoff
amdgpu_gfxoff
amdgpu_gfxoff_count
amdgpu_gfxoff_residency
amdgpu_gfxoff_status
$ sudo cat /sys/kernel/debug/dri/0/amdgpu_gfxoff_status
cat: amdgpu_gfxoff_status: Invalid argument
All gfxoff-related debugfs reads return EINVAL, suggesting the SMU is not responding cleanly to runtime state queries.
Workload test
$ gamescope -- glxgears &
$ watch -n 0.5 cat /sys/class/drm/card0/device/pp_dpm_sclk
# Active level remains "1: 0Mhz *" throughout
# mangohud overlay: GPU 0%, CPU 99%, ~2 FPS
# VRAM allocation static at 108MB
What I tried, in order
Hidden GPU in dom0 GRUB via xen-pciback.hide
Confirmed pcibackas kernel driver in use in dom0
Standard qvm-pci attach --persistent --no-strict-resetfor both GPU functions
Driver and firmware load cleanly inside the qube; ATOM BIOS fetched; SMU “initialized successfully”
Tried amdgpu.ppfeaturemask=0xffffffff— no effect on GFXOFF state
Tried amdgpu.gfxoff=0kernel parameter — kernel reported “unknown parameter ignored” (no longer present in current amdgpu)
Tried amdgpu.ppfeaturemask=0xfff7ffffto specifically clear GFXOFF bit 19 — firmware returned mask0x71ffffff(override)
Tried debugfs runtime disable: echo 0 | sudo tee /sys/kernel/debug/dri/0/amdgpu_gfxoff— no effect; reads return EINVAL
Manual DPM forcing via power_dpm_force_performance_level=manualplus explicit level writes topp_dpm_sclk— sclk refused to leave level 1 (0 MHz)
Enabled permissive=trueandno-strict-reset=trueper device viaqvm-pci attach -o— confirmed in libvirt XML aswriteFiltering='no'
Enabled global xen-pciback.permissive=1andxen-pciback.passthrough=1in dom0 GRUB — confirmedYin/sys/module/xen_pciback/parameters/
Full dom0 reboot to ensure flag propagation through Xen — no change in symptoms
Switched physical BIOS on dual-BIOS card — different ATOM BIOS string confirmed (113-EXT80009-100→113-EXT80052-100), identical SMU behavior, identical 2 FPS
Hypothesis
The SMU firmware on Navi 31 enforces GFXOFF as mandatory regardless of driver ppfeaturemask requests. In a normal bare-metal environment, the SMU coordinates with the host platform via PSP authentication and Runtime PM signaling to dynamically engage and disengage GFXOFF as workload demands.
In Xen passthrough:
- The
System can't access extended configuration spacewarning indicates the bridge chain restricts ECAM access to PCIe registers above offset 0xFF, where SMU power coordination registers reside. - The
Runtime PM not availablemessage indicates the kernel cannot fully signal power state transitions to the SMU. - The PSP authentication chain that AMD designed into RDNA3 security cannot complete because the guest cannot reach the host platform registers it expects.
With incomplete authentication, the SMU defaults to maximally conservative behavior: keep shader engines in GFXOFF, refuse to honor unauthenticated driver requests to disengage. Memory clock and other domains that don’t fall under this security regime continue to function.
This is consistent with broader VFIO community reports of RDNA3 (RX 7000 desktop series) being unsuitable for passthrough on KVM as well as Xen, though I haven’t found a definitive Qubes-specific report.
Questions for the community
- Has anyone successfully passed through any RDNA3 desktop GPU (RX 7700 XT, 7800 XT, 7900 XT/XTX) to a Qubes guest with functional shader clocks under load?
- Are there Xen-level configurations or patches that expose extended PCI configuration space access more permissively than the current global
xen-pciback.permissive=1and per-devicepermissive=trueflags? - Is there a known path to modify the Navi 31 vBIOS powerplay table to clear the GFXOFF feature bit, accounting for the signature checks on these cards?
- Are there per-VM stubdom or libxl configurations that might expose additional privileged register access for AMD GPUs specifically?
- Has anyone successfully used the
amdgpumobile RDNA3 variants (RX 7700S in Framework 16) in Qubes passthrough? Mobile parts may have different SMU policies.
Willing to test
Happy to provide additional logs, test alternative kernel parameters, try patched modules, or contribute structured HCL data. The data above represents around 30 hours of investigation; the rabbit hole is well-mapped but the wall hasn’t moved.
Environment
- Will attach
qubes-hcl-reportoutput as a follow-up - Can provide full
dmesg,journalctl, andlspci -vvvfor the guest if useful - Full conversation history with detailed diagnostic reasoning available on request
Interested, but too ignorant to contribute meaningfully, I had a look in my dom0.
At least for amd_gfxoff, trying to cat it just blocks, but using xxd -l1 -p gives expected output. OTOH, the other amd_gfxoff* give read(?) errors
Not sure what the read interface is supposed to be, but I got the tip here:
https://docs.kernel.org/gpu/amdgpu/thermal.html
In dom0, acceleration all seems to work otherwise, with an pp_dpm_sclk line of S: 0Mhz appearing when idle, and disappearing when I run glxgears, and sclk goes to a higher value.
perf = high seems to make no difference vs default auto.
(different behaviour from yours could be because it’s dom0 not PT, different gpu, or because I’ve got a current-testing kernel running, but it seems the xxd/cat might affect your findings)
uninteresting..
Mine is a cheap AMD GPU, which I abandoned for PT because of yh famous reset bug