Issues with GPU passthrough / amdgpu

tomasv · December 31, 2022, 9:24pm

Hi,

I’m struggling with setting up GPU passthrough on a new machine with a discrete GPU from AMD (RX 6800 XT, not sure the exact model matters much).

Some time ago I managed to setup GPU passthrough on an older laptop with a Nvidia Quadro T1000 Mobile, mostly by following NVIDIA GPU passthrough into Linux HVMs for CUDA applications (and the linked post by neowutran). So I tried doing the same thing on the new workstation, but I ran into some issue that I’m not sure how to solve

Note: My goal is to use the GPU for opencl stuff - I’m fine with a VM accessible only through text console, perhaps that might simplify stuff a bit.

Anyway here’s what I did (the main/interesting parts, I don’t want to repeat all the details mentioned in the linked posts):

list the PCI devices

dom0:06_00.0  VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]                                           gpu_vm
dom0:06_00.1  Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]  gpu_vm

add them to GRUB_CMDLINE_LINUX, rebuild grub.cfg and reboot

GRUB_CMDLINE_LINUX="… rd.qubes.hide_pci=06:00.0,06:00.1"

I did check that if I press “e” in grub, the option is there during reboot.

modify the stubroot, as described in one of the linked guides (AFAICS this is not entirely necessary as my machine has less than 3.5GB of RAM, but I did it anyway)
create a standalone VM “gpu_vm” with ~3GB of RAM / a couple CPUs, install Fedora 37
attach the PCI devices to the VM

qvm-pci attach -o permissive=True -o no-strict-reset=True --persistent gpu_vm dom0:06_00.0
qvm-pci attach -o permissive=True -o no-strict-reset=True --persistent gpu_vm dom0:06_00.1

(I did try this with/without the permissive/no-strict-reset flags, also from the VM settings.)

(re)start the VM

The gpu_vm VM usually starts fine, and things seem to work to some extent - I can log-in, install and run radeontop, etc.

But there are various issues:

radeontop reports 100% utilization on everything, which is clearly nonsense - nothing is running
there seem to be strange dmesg issues in dmesg (in the VM)

when I restart the VM, it usually fails with a message:

Qube gpu_vm has failed to start: internal error: Unknown PCI header type '127' for device '0000:06:00.1'

and libxl-driver.log says

2022-12-31 18:20:20.608+0000: libxl: libxl_pci.c:1484:libxl__device_pci_reset: write to /sys/bus/pci/devices/0000:06:00.0/reset returned -1: Inappropriate ioctl for device
2022-12-31 18:20:20.670+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:06:00.1
2022-12-31 18:26:30.857+0000: libxl: libxl_pci.c:1484:libxl__device_pci_reset: write to /sys/bus/pci/devices/0000:06:00.0/reset returned -1: Inappropriate ioctl for device
2022-12-31 18:26:30.986+0000: libxl: libxl_pci.c:1489:libxl__device_pci_reset: The kernel doesn't support reset from sysfs for PCI device 0000:06:00.1

I did try different combinations of no-strict-reset/permissive, not-adding the audio part (shouldn’t be needed for opencl, I guess), but no luck.

I’m not sure what’s wrong or what to look for, so I was trying various things. For example I did notice dmesg in dom0 still says even after hiding the PCI device

[    4.179504] [drm] amdgpu kernel modesetting enabled.
[    4.179554] amdgpu: CRAT table not found
[    4.179555] amdgpu: Virtual CRAT table created for CPU
[    4.179561] amdgpu: Topology: Add CPU node

Which seems strange, so I tried blocking the amdgpu module by addning modprobe.blacklist=amdgpu. The module disappeared from dmesg, but that didn’t resolve the issue.

I also tried installing kernel-latest (6.0.12), but again no difference

So, what am I doing wrong?

neowutran · January 1, 2023, 8:22am

Could be related to the kernel issue I am having (Ryzen 7000 serie - #55 by neowutran)
You could check if LTS kernel 5.4 work for you, but maybe your GPU is too recent for this kernel.
Also you should try to use a kernel provided by the VM and not by qubes

tomasv · January 1, 2023, 2:39pm

I can try a 5.4 kernel, but where do I get a LTS kernel this old? I only see 5.15.x in qubes (both for kernel and kernel-qubes-vm).

What do you mean by using kernel provided by the VM? I found this (Managing qube kernels | Qubes OS) but it’s unclear to me how that applies to a standalone / HVM? I mean, that is already running it’s own kernel, no?

tomasv · January 1, 2023, 2:47pm

FWIW I was googling a bit, and the error

Unknown PCI header type '127' for device '0000:06:00.1'

seems like a symptom of AMD GPU reset bug. There was even a module (GitHub - gnif/vendor-reset: Linux kernel vendor specific hardware reset module for sequences that are too complex/complicated to land in pci_quirks.c) fixing that - but my understanding is this should be fixed with recent GPUs, so maybe the issue is elsewhere (VM config) but with the same symptom.

neowutran · January 1, 2023, 4:39pm

Yes that is what i means.
by default, every qubes type (hvm or not) use a kernel provided by QubesOS.
You can check in the qubes manager gui ( click on your gpu_vm → Settings → Advanced → Kernel → make sure that you are using “(provided by qube)” to use the kernel provided by your vm)

And to try the 5.4 lts, it depend on your “gpu_vm” operating system. It can be provided by a package or you could be required to compile it yourself

tomasv · January 4, 2023, 1:04pm

OK, I did some experiments with kernel versions. Unfortunately, no success so far

I realized the VM was using a separate kernel all along, because one of the things I did when creating the VM was

qvm-prefs gpu_vm kernel ''

Fedora generally provides packages only for fairly recent kernels, so I had to build custom ones - I started with 5.4.228, as suggested (using the .config from Fedora SRPM for a matching version).

The good thing is that doesn’t fail - there are no errors in dmesg, and the VM can be restarted without the PCI error:

internal error: Unknown PCI header type '127' for device '0000:06:00.1'

The bad thing is seems to be due to kernel not supporting the GPU, and thus not initializing it. So, no GPU in the VM.

So I tried increasing the kernel version - first to 5.10.x and 5.15.x, which are LTS available at kernel.org. 5.10 has the same result as 5.4 (no support, no failure), while 5.15 starts having the issues described above (dmesg errors, PCI header type failure).

Ultimately it seems the version that started breaking is 5.14.

Not sure what else to try …

neowutran · January 4, 2023, 5:48pm

I don’t known if it will work, but I would try:

Check your motherboard bios, any new upgrade available ?
Check your IOMMU groups, is your GPU alone in it’s own ?

tomasv · January 4, 2023, 6:44pm

The MB has the newest BIOS available - updating it was the first thing I did ~two weeks ago.

Not sure about IOMMU groups, though. There’s nothing in /sys/kernel/iommu_groups, i.e. output of

tree /sys/kernel/iommu_groups

is empty. Not sure why, but I see the same thing on my laptop with Nvidia T1000, and I’ve been able to setup passthrough there.

However, xl dmesg says this:

(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Posted Interrupt not enabled.
(XEN) Intel VT-d Shared EPT tables enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed

so there seem to be some IOMMU groups etc.

neowutran · January 4, 2023, 6:46pm

Xen does not make this information available, need to boot to some other linux distro ( or live boot ) to type this command

tomasv · January 4, 2023, 8:35pm

D’oh! I completely missed that bit, and I was really confused why I don’t see any info about IOMMU groups. I’ll check from Fedora once I get to the machine again.

tomasv · January 6, 2023, 6:01pm

I finally had time to look at the iommu groups, and with iommu=1 intel_iommu=on it looks like this (combination of lspci and /sys/kernel/iommu_groups):

IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers [8086:4668] (rev 02)
IOMMU Group 10 00:1c.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #5 [8086:7abc] (rev 11)
IOMMU Group 11 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:7a86] (rev 11)
IOMMU Group 11 00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
IOMMU Group 11 00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
IOMMU Group 11 00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
IOMMU Group 12 01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
IOMMU Group 13 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU Group 14 04:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
IOMMU Group 15 05:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 16 06:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M] [1002:743f] (rev c1)
IOMMU Group 17 06:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
IOMMU Group 1 00:02.0 VGA compatible controller [0300]: Intel Corporation AlderLake-S GT1 [8086:4680] (rev 0c)
IOMMU Group 2 00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 02)
IOMMU Group 3 00:08.0 System peripheral [0880]: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] (rev 02)
IOMMU Group 4 00:0a.0 Signal processing controller [1180]: Intel Corporation Platform Monitoring Technology [8086:467d] (rev 01)
IOMMU Group 5 00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
IOMMU Group 5 00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
IOMMU Group 6 00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
IOMMU Group 7 00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
IOMMU Group 8 00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
IOMMU Group 9 00:1c.2 PCI bridge [0604]: Intel Corporation Device [8086:7aba] (rev 11)

So it seems there are two IOMMU groups for the GPU - 16 (audio) and 17 (video).

The only thing I can think of now is that maybe it matters which PCIe slot is used - there are two on the MB: (a) 4.0 x16 and (b) 3.0 x4. At the moment the GPU is in (b) - 3.0 x4 is what this particular GPU uses/needs, and the idea was to maybe buy a better GPU and stick it into the 4.0 x16 slot.

But maybe it matters, even though there are separate IOMMU groups … while checking the MB manual again, I noticed the 4.0 x16 slot has description “from CPU” while the 3.0 x4 slot “from B660 chipset”.

enmus · January 6, 2023, 6:47pm

Noise, probably

Sorry for the probable noise, but for example I passed through my audio card to sys-audio with

xen-pciback.hide=(00:1b.0)

Never new what was the difference comparing to rd.qubes.hide_pci, although asked, but without feedback.

tomasv · January 6, 2023, 7:36pm

I have xen-pcibach.hide a try, replacing

rd.qubes.hide_pci=06:00.0,06:00.1

with

xen-pciback.hide=(06:00.0)(06:00.1)

but the issues/behavior with starting the VM seems exactly the same.

tomasv · January 6, 2023, 7:45pm

I tried moving the GPU to a different PCIe slot, but strangely enough Qubes fails to start

I’ve removed the rd.qubes.hide_pci stuff from grub, and I’ve also detached the PCI devices from the VM. After entering the LUKS password the system starts to boot and gets to

Starting Light Display Manager...
Starting Hold until boot process finishes up...

And then it just reboots. Not sure what’s going on.

FWIW I’ve booted into Fedora (from Live USB), and there everything works fine - including the GPU (moved to the other PCIe slot). I can even run radeontop and that works fine too.

tomasv · January 7, 2023, 1:24am

FWIW I did try with the GPU in the other PCIe slot - I ended up doing a fresh Qubes install on a different SSD, instead of investigating why the first install fails to start after moving the GPU. But the the results are exactly the same. So that wasn’t it, apparently.

neowutran · January 7, 2023, 10:39am

I had the same issues if I tried to hide a pci that I was using / or was not the GPU I wanted to passthrough.
You could also try to test a windows vm just in case.

tomasv · January 7, 2023, 11:50am

What do you mean by “PCI that I was using”? Per the lspci it’s the two devices for the GPU, and there are no other discrete GPUs in the system. So how could it not be the device I want to passthrough?

And how could I be using the device? I mean, I’ve hidden it using rd.qubes.hide_pci, I’m using the integrated GPU, etc. Or could the iommu be borked in some way, making the “isolation” imperfect in some sense? (possibly silly idea, this is entirely outside my area of expertise)

I might give Windows a try. The problem is I’ve not used Windows for ages and I don’t even know if there is an installer I might use. I certainly don’t have any product keys or what it’s called now.

tomasv · January 7, 2023, 12:03pm

However, now that you mention it, I noticed I still see this in the dom0:

[root@dom0 ~]# lsmod | grep amdgpu
amdgpu              10661888  0
drm_ttm_helper         16384  1 amdgpu
iommu_v2               24576  1 amdgpu
ttm                    94208  3 amdgpu,drm_ttm_helper,i915
gpu_sched              49152  1 amdgpu
drm_buddy              20480  2 amdgpu,i915
drm_display_helper    184320  2 amdgpu,i915

[root@dom0 ~]# dmesg | grep amd
[    5.046178] [drm] amdgpu kernel modesetting enabled.
[    5.046255] amdgpu: CRAT table not found
[    5.046257] amdgpu: Virtual CRAT table created for CPU
[    5.046267] amdgpu: Topology: Add CPU node

Isn’t that a bit weird, considering the GPU should be hidden using rd.qubes.hide_pci?

neowutran · January 7, 2023, 12:36pm

For windows VM, I like using that, it is nicely done: GitHub - ElliotKillick/qvm-create-windows-qube: Spin up new Windows qubes quickly, effortlessly and securely on Qubes OS
for “rd.qubes.hide_pci”, in dom0, do sudo lspci -v for the PCI device your tried to hide you should see

Kernel modules: amdgpu
Kernel driver in use: pciback

What do you mean by “PCI that I was using”? Per the lspci it’s the two devices for the GPU, and there are no other discrete GPUs in the system. So how could it not be the device I want to passthrough?

I am currently doing a lot of tests related to gpu passthrough, with multiple computer.
One time I plugged the secondary GPU in the other PCI slot. So “rd.qubes.hide_pci” parameter was targeting the primary GPU. Resulting in this error

Starting Light Display Manager...
Starting Hold until boot process finishes up...

tomasv · January 7, 2023, 1:18pm

For windows VM, I like using that, it is nicely done: GitHub - elliotkillick/qvm-create-windows-qube: Spin up new Windows qubes quickly, effortlessly and securely on Qubes OS

This looks great! Will try, thanks.

Kernel modules: amdgpu
Kernel driver in use: pciback

Yes, this is what I see for both GPU devices (the amdgpu and snd_hda_intel).

I am currently doing a lot of tests related to gpu passthrough, with multiple computer.
One time I plugged the secondary GPU in the other PCI slot. So “rd.qubes.hide_pci” parameter was targeting the primary GPU. Resulting in this error
Starting Light Display Manager…
Starting Hold until boot process finishes up…

But I did remove the rd.qubes.hide_pci stuff from the grub config (by editing during boot). So I guess the PCI device was stored in some additional place,