Running Bazzite in a qube with gpu passthrough

wsmith115 · May 9, 2026, 5:39pm

I need a hand. I’ve got gpu passthough working in a different qube based on fedora-42 template, verified by running ollama and radeontop, but for some reason my Bazzite standalone qube refuses to use use the gpu despite detecting it. I’m not running the two qubes at the same time. I’ve tried rebooting then opening Bazzite in case the llm qube didn’t release it properly. lspci detects the card, but no software will actually start utilizing it. The qube is HVM with UEFI Any help would be appreciated.

crispy_levitator · May 9, 2026, 10:52pm

Hi,

I believe this is because fedora/bazzite dropped x11 (you need this).

I tried bazzite as well as a bunch of others, my gaming qube is cachyOS with x11, it works great.

wayland support for qubes is a work in progress

crispy_levitator · May 9, 2026, 11:14pm

Also to get it to use the gpu install the ‘prime-run’ package, and then run your program, eg ‘prime-run steam’

wsmith115 · May 11, 2026, 2:19pm

I forgot to mention it’s an AMD gpu that changes things slightly as if I remember right prime-run is for NVIDIA.

frank_sullivan · May 11, 2026, 8:12pm

If Bazzite is only using llvmpipe even though it sees your GPU, the emulated virtual adapter is probably enumerated as /dev/dri/card0 and your passthrough GPU is /dev/dri/card1.

I was able to get Bazzite to use my passthough GPU by flipping the order in the KWIN_DRM_DEVICES variable.

create /etc/profile.d/passthrough-gpu.sh

export KWIN_DRM_DEVICES=/dev/dri/card1:/dev/dri/card0

then make it executable with

sudo chmod +x /etc/profile.d/passthrough-gpu.sh

Log out of your KDE session and back in again. I’m not sure what the equivalent would be for Gnome.

wsmith115 · May 12, 2026, 7:19pm

Tried cachyOS and couldn’t get steam to boot at all so I went back to Bazzite because I can get Barony playable even without GPU.
tried the KWIN_DRM_DEVICES and it caused the glxinfo to show my passthrough gpu as the OpenGL Renderer, but the desktop became a slide show. ran mangohud Barony (a small linux native game) and I got less FPS then when it was running on pure cpu showing 0~1% gpu usage.
It looks to pretty consitently enumerate the amd card as card0.
If I use gamescope --mangoapp – glxgears I’m getting 10 FPS, but it says it’s selected the amd card (7900 XTX)
Passing though my card for compute was way easier lol

wsmith115 · May 19, 2026, 2:53am

AMD RX 7900 XTX (Navi 31) GPU passthrough: SMU enforces GFXOFF regardless of driver/hypervisor configuration

Summary

GPU passthrough of an AMD RX 7900 XTX to a Bazzite HVM qube completes successfully at the driver and firmware level — amdgpu loads, ATOM BIOS fetched, SMU reports successful initialization, Vulkan device selection works. However, the GPU’s shader clock domain (sclk) remains parked at 0 MHz regardless of workload or DPM force requests, producing roughly 2 FPS on basic gamescope -- glxgears tests. Memory clock (mclk) ramps normally; only the shader domain is affected.

After exhausting kernel-side, driver-side, and Xen-side mitigations, the evidence converges on the SMU firmware enforcing GFXOFF as mandatory and refusing to honor driver requests to disengage it — likely due to incomplete PSP authentication and restricted extended-configuration-space access in the Xen passthrough environment.

Posting in the hope that someone has cracked this on RDNA3, or to add concrete documentation to the body of community knowledge if not.

Hardware

CPU: AMD, 16 threads, ~64GB system RAM
GPU: PowerColor RX 7900 XTX (Navi 31)
- PCI ID: 1002:744C, subsystem 148C:2422
- Dual-BIOS (both BIOSes tested, identical behavior)
- Host BDF: 0000:0e:00.0 (display) and 0000:0e:00.1 (audio)
Storage: Dual NVMe drives

The GPU sits four hops down from root complex per qvm-pci list:

dom0:00_03.1-00_00.0-00_00.0-00_00.0  Display
dom0:00_03.1-00_00.0-00_00.0-00_00.1  Audio

Three PCIe bridges between root complex and endpoint.

Software

Host: Qubes OS 4.x with current updates
Guest qube: gameqube, HVM mode, OVMF/UEFI, stubdom-linux emulator
- 16GB RAM, 6 vCPUs
- Bazzite (rpm-ostree-based Fedora) as guest OS
GPU driver: in-tree amdgpu

Dom0 configuration

/etc/default/grub:

GRUB_CMDLINE_LINUX="... rd.driver.pre=xen-pciback xen-pciback.hide=(0000:0e:00.0)(0000:0e:00.1) xen-pciback.passthrough=1 xen-pciback.permissive=1"

Global pciback state confirmed:

$ cat /sys/module/xen_pciback/parameters/permissive
Y
$ cat /sys/module/xen_pciback/parameters/passthrough
Y

Per-device attachment with both flags:

$ qvm-pci list gameqube
BACKEND:DEVID                         DESCRIPTION                USED BY
dom0:00_03.1-00_00.0-00_00.0-00_00.0  Navi 31 Display            gameqube (attached: permissive=true, no-strict-reset=true)
dom0:00_03.1-00_00.0-00_00.0-00_00.1  Navi 31 HDMI/DP Audio      gameqube (attached: permissive=true, no-strict-reset=true)

Libvirt XML confirms writeFiltering='no' and nostrictreset='yes' on both <hostdev> entries.

Symptoms

Inside the qube:

lspci shows the GPU; amdgpu is the kernel driver in use
Card binds, firmware loads, displays are detected, Vulkan device selection works
VRAM allocation static at ~108 MB regardless of workload
gamescope -- glxgears produces 2-16 FPS with mangohud reporting GPU 0%, CPU 99%
Same behavior under Wayland (Bazzite KDE) and X11 wrappers
MESA_VK_DEVICE_SELECT confirms RADV is selected, not lavapipe

Diagnostic output

amdgpu initialization (dmesg, abbreviated)

amdgpu 0000:00:07.0: amdgpu: initializing kernel modesetting
  (IP DISCOVERY 0x1002:0x744C 0x148C:0x2422 0xC8)
amdgpu 0000:00:07.0: amdgpu: detected ip blocks: psp_v13_0, smu_v13_0_0,
  gfx_v11_0, sdma_v6_0, vcn_v4_0, jpeg_v4_0, mes_v11_0  (and others)
amdgpu 0000:00:07.0: amdgpu: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000
amdgpu 0000:00:07.0: amdgpu: Fetched VBIOS from ROM
amdgpu: ATOM BIOS: 113-EXT80009-100
amdgpu 0000:00:07.0: amdgpu: System can't access extended configuration space, please check!!
amdgpu 0000:00:07.0: amdgpu: VRAM: 24560M ... GART: 512M
amdgpu 0000:00:07.0: amdgpu: smu driver if version = 0x0000003d,
  smu fw if version = 0x00000040, smu fw version = 0x004e8300 (78.131.0)
amdgpu 0000:00:07.0: amdgpu: SMU driver if version not matched
amdgpu 0000:00:07.0: amdgpu: SMU is initialized successfully!
amdgpu 0000:00:07.0: amdgpu: [drm] Display Core v3.2.359 initialized on DCN 3.2
amdgpu 0000:00:07.0: amdgpu: Runtime PM not available

Three notable messages:

“System can’t access extended configuration space” — PCIe ECAM access restricted by the bridge chain
“SMU driver if version not matched” (warning, not fatal — SMU still initializes)
“Runtime PM not available” — kernel cannot fully manage GPU power state transitions

Clock domain behavior

Forcing power_dpm_force_performance_level=high ramps memory clock correctly but shader clock refuses:

$ echo high | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level

$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz
1: 0Mhz *      <- active level, stuck
2: 2526Mhz

$ cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 96Mhz
1: 456Mhz
2: 772Mhz
3: 1249Mhz *   <- memory clock honored

Manual DPM forcing also refused:

$ echo manual | sudo tee /sys/class/drm/card0/device/power_dpm_force_performance_level
$ echo 2 | sudo tee /sys/class/drm/card0/device/pp_dpm_sclk
$ echo 3 | sudo tee /sys/class/drm/card0/device/pp_dpm_mclk
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz
1: 0Mhz *      <- still parked despite explicit level-2 request
2: 2526Mhz

The 0 MHz reading is consistent with GFXOFF holding the shader engines in a power-gated state.

pp_features mask, before and after kernel parameter

After applying amdgpu.ppfeaturemask=0xfff7ffff (intent: clear GFXOFF bit 19):

$ cat /proc/cmdline | tr ' ' '\n' | grep amdgpu
amdgpu.ppfeaturemask=0xfff7ffff

$ cat /sys/class/drm/card0/device/pp_features
features high: 0x0003ebb8 low: 0x71ffffff
...
17. GFX_ULV    (17) : enabled
19. GFXOFF     (19) : enabled    <- firmware kept GFXOFF set despite kernel request
20. BACO       (20) : enabled

The firmware returns 0x71ffffff regardless of the requested mask, indicating bit 19 (GFXOFF) is firmware-locked and cannot be cleared from the driver side.

GFXOFF debugfs interface

The debugfs entries exist but return errors on read:

$ ls /sys/kernel/debug/dri/0/ | grep gfxoff
amdgpu_gfxoff
amdgpu_gfxoff_count
amdgpu_gfxoff_residency
amdgpu_gfxoff_status

$ sudo cat /sys/kernel/debug/dri/0/amdgpu_gfxoff_status
cat: amdgpu_gfxoff_status: Invalid argument

All gfxoff-related debugfs reads return EINVAL, suggesting the SMU is not responding cleanly to runtime state queries.

Workload test

$ gamescope -- glxgears &
$ watch -n 0.5 cat /sys/class/drm/card0/device/pp_dpm_sclk
# Active level remains "1: 0Mhz *" throughout
# mangohud overlay: GPU 0%, CPU 99%, ~2 FPS
# VRAM allocation static at 108MB

What I tried, in order

Hidden GPU in dom0 GRUB via xen-pciback.hide
Confirmed pciback as kernel driver in use in dom0
Standard qvm-pci attach --persistent --no-strict-reset for both GPU functions
Driver and firmware load cleanly inside the qube; ATOM BIOS fetched; SMU “initialized successfully”
Tried amdgpu.ppfeaturemask=0xffffffff — no effect on GFXOFF state
Tried amdgpu.gfxoff=0 kernel parameter — kernel reported “unknown parameter ignored” (no longer present in current amdgpu)
Tried amdgpu.ppfeaturemask=0xfff7ffff to specifically clear GFXOFF bit 19 — firmware returned mask 0x71ffffff (override)
Tried debugfs runtime disable: echo 0 | sudo tee /sys/kernel/debug/dri/0/amdgpu_gfxoff — no effect; reads return EINVAL
Manual DPM forcing via power_dpm_force_performance_level=manual plus explicit level writes to pp_dpm_sclk — sclk refused to leave level 1 (0 MHz)
Enabled permissive=true and no-strict-reset=true per device via qvm-pci attach -o — confirmed in libvirt XML as writeFiltering='no'
Enabled global xen-pciback.permissive=1 and xen-pciback.passthrough=1 in dom0 GRUB — confirmed Y in /sys/module/xen_pciback/parameters/
Full dom0 reboot to ensure flag propagation through Xen — no change in symptoms
Switched physical BIOS on dual-BIOS card — different ATOM BIOS string confirmed (113-EXT80009-100 → 113-EXT80052-100), identical SMU behavior, identical 2 FPS

Hypothesis

The SMU firmware on Navi 31 enforces GFXOFF as mandatory regardless of driver ppfeaturemask requests. In a normal bare-metal environment, the SMU coordinates with the host platform via PSP authentication and Runtime PM signaling to dynamically engage and disengage GFXOFF as workload demands.

In Xen passthrough:

The System can't access extended configuration space warning indicates the bridge chain restricts ECAM access to PCIe registers above offset 0xFF, where SMU power coordination registers reside.
The Runtime PM not available message indicates the kernel cannot fully signal power state transitions to the SMU.
The PSP authentication chain that AMD designed into RDNA3 security cannot complete because the guest cannot reach the host platform registers it expects.

With incomplete authentication, the SMU defaults to maximally conservative behavior: keep shader engines in GFXOFF, refuse to honor unauthenticated driver requests to disengage. Memory clock and other domains that don’t fall under this security regime continue to function.

This is consistent with broader VFIO community reports of RDNA3 (RX 7000 desktop series) being unsuitable for passthrough on KVM as well as Xen, though I haven’t found a definitive Qubes-specific report.

Questions for the community

Has anyone successfully passed through any RDNA3 desktop GPU (RX 7700 XT, 7800 XT, 7900 XT/XTX) to a Qubes guest with functional shader clocks under load?
Are there Xen-level configurations or patches that expose extended PCI configuration space access more permissively than the current global xen-pciback.permissive=1 and per-device permissive=true flags?
Is there a known path to modify the Navi 31 vBIOS powerplay table to clear the GFXOFF feature bit, accounting for the signature checks on these cards?
Are there per-VM stubdom or libxl configurations that might expose additional privileged register access for AMD GPUs specifically?
Has anyone successfully used the amdgpu mobile RDNA3 variants (RX 7700S in Framework 16) in Qubes passthrough? Mobile parts may have different SMU policies.

Willing to test

Happy to provide additional logs, test alternative kernel parameters, try patched modules, or contribute structured HCL data. The data above represents around 30 hours of investigation; the rabbit hole is well-mapped but the wall hasn’t moved.

Environment

Will attach qubes-hcl-report output as a follow-up
Can provide full dmesg, journalctl, and lspci -vvv for the guest if useful
Full conversation history with detailed diagnostic reasoning available on request

phceac · May 22, 2026, 10:52am

Interested, but too ignorant to contribute meaningfully, I had a look in my dom0.

At least for amd_gfxoff, trying to cat it just blocks, but using xxd -l1 -p gives expected output. OTOH, the other amd_gfxoff* give read(?) errors

Not sure what the read interface is supposed to be, but I got the tip here:
https://docs.kernel.org/gpu/amdgpu/thermal.html

In dom0, acceleration all seems to work otherwise, with an pp_dpm_sclk line of S: 0Mhz appearing when idle, and disappearing when I run glxgears, and sclk goes to a higher value.

perf = high seems to make no difference vs default auto.

(different behaviour from yours could be because it’s dom0 not PT, different gpu, or because I’ve got a current-testing kernel running, but it seems the xxd/cat might affect your findings)

uninteresting..

Mine is a cheap AMD GPU, which I abandoned for PT because of yh famous reset bug