AMD iGPU passthrough attempt

My current status (with the setup described in these salt recipes) shows in the VM logs:

[2021-10-06 22:38:28] [    3.292678] [drm] BIOS signature incorrect 0 0
[2021-10-06 22:38:28] [    3.292699] amdgpu 0000:00:05.0: Invalid PCI ROM data signature: expecting 0x52494350, got 0xcb03aa55
[2021-10-06 22:38:28] [    3.342064] [drm] BIOS signature incorrect 0 0
[2021-10-06 22:38:28] [    3.342169] [drm:amdgpu_get_bios [amdgpu]] *ERROR* Unable to locate a BIOS ROM
[2021-10-06 22:38:28] [    3.342209] amdgpu 0000:00:05.0: amdgpu: Fatal error during GPU init
[2021-10-06 22:38:28] [    3.342284] amdgpu 0000:00:05.0: amdgpu: amdgpu: finishing device.

the stubdom log shows:

[2021-10-06 21:29:01] pcifront pci-0: Installing PCI frontend
[2021-10-06 21:29:01] xen:swiotlb_xen: Warning: only able to allocate 4 MB for software IO TLB
[2021-10-06 21:29:01] software IO TLB: mapped [mem 0x04c00000-0x05000000] (4MB)
[2021-10-06 21:29:01] written 110 bytes to vchan
[2021-10-06 21:29:01] pcifront pci-0: Creating PCI Frontend Bus 0000:00
[2021-10-06 21:29:01] pcifront pci-0: PCI host bridge to bus 0000:00
[2021-10-06 21:29:01] pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
[2021-10-06 21:29:01] pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffffffff]
[2021-10-06 21:29:01] pci_bus 0000:00: root bus resource [bus 00-ff]
[2021-10-06 21:29:01] pci 0000:00:00.0: [1002:1636] type 00 class 0x030000
[2021-10-06 21:29:01] pci 0000:00:00.0: reg 0x10: [mem 0xb0000000-0xbfffffff 64bit pref]
[2021-10-06 21:29:01] pci 0000:00:00.0: reg 0x18: [mem 0xc0000000-0xc01fffff 64bit pref]
[2021-10-06 21:29:01] pci 0000:00:00.0: reg 0x20: [io  0xe000-0xe0ff]
[2021-10-06 21:29:01] pci 0000:00:00.0: reg 0x24: [mem 0xfe400000-0xfe47ffff]
[2021-10-06 21:29:01] pci 0000:00:00.0: reg 0x30: [mem 0x000c0000-0x000dffff pref]
[2021-10-06 21:29:01] pcifront pci-0: claiming resource 0000:00:00.0/0
[2021-10-06 21:29:01] pcifront pci-0: claiming resource 0000:00:00.0/2
[2021-10-06 21:29:01] pcifront pci-0: claiming resource 0000:00:00.0/4
[2021-10-06 21:29:01] pcifront pci-0: claiming resource 0000:00:00.0/5
[2021-10-06 21:29:01] pcifront pci-0: claiming resource 0000:00:00.0/6
[2021-10-06 21:29:01] pci 0000:00:00.0: can't claim BAR 6 [mem 0x000c0000-0x000dffff pref]: address conflict with Reserved [mem 0x000a0000-0x000fffff]
[2021-10-06 21:29:01] pcifront pci-0: Could not claim resource 0000:00:00.0/6! Device offline. Try using e820_host=1 in the guest config.

Could it be that the ROM should be accessible from BAR 6 ?

Is the Try using e820_host=1 in the guest config suggestion useful for us ? From the code it looks like the pci-e820-host feature defaults to 1 already (but then, seeing the libvirt config for sys-gui-gpu would help to confirm where we stand).

Since the ROM apparently can’t be read from the GPU, I booted on a Debian Live stick, and was able to extract it from /sys (though I’m not yet sure it is a pristine ROM image and not a shadow RAM copy that would have been patched, eg. by the EFI driver). It does have the proper signature where the kernel’s pci/rom.c is looking for it. To use this ROM I tried this patch to the pci.xml template:

--- pci.xml.orig	2021-10-05 00:47:56.599213557 +0200
+++ pci.xml	2021-10-06 21:48:24.315969520 +0200
@@ -12,6 +12,9 @@
             slot="0x{{ device.device }}"
             function="0x{{ device.function }}" />
     </source>
+{% if options.get('vga-rom', False) %}
+    <rom bar="on" file="{{ options.get('vga-rom', '') }}" />
+{% endif %}
 </hostdev>
 
 {# vim : set ft=jinja ts=4 sts=4 sw=4 et : #}

… and hacked by hand a vga-rom option in qubes.xml, with:

<option name='vga-rom'>/path/to/renoir.rom</option>

It looks like the <rom> element is indeed parsed (if I place it in a wrong place, eg. inside <source> I do see an error through journalctl)… but that does not change the system’s behaviour.

Anyone with a clue ?

I’m pretty sure the option+file is not transferred to the qemu in stubdomain. And indeed the “rom” option is documented as QEMU/KVM only.

This is about the stubdom’s address space, not target domain’s one. So, the e820_host=1 should be added to the stubdom’s “config”. I quote “config”, because it doesn’t really exist, it gets dynamically created, and e820_host setting is not there.
If you are ok with rebuilding xen(-libs) package, you can add libxl_defbool_set(&dm_config->b_info.u.pv.e820_host, true) somewhere there.

Take a look also at https://github.com/QubesOS/qubes-vmm-xen-stubdom-linux/pull/29/files - it’s about (I think) very similar issue for Intel graphics.

Ah, the case of the file should have been pretty clear, in fact :smiley:

That sounds interesting, indeed. This may possibly shed some light on of a fact for which I didn’t have an explanation for yet: this video card in Qubes does not expose its ROM through sysfs, although the Debian Live kernel does, and the dom0 kernel does see it:

[    1.201712] pci 0000:07:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
...
[    3.640623] amdgpu 0000:07:00.0: amdgpu: Fetched VBIOS from ROM BAR
...
[  105.927621] ACPI: video: Video Device [VGA] (multi-head: yes  rom: no  post: no)

(which BTW confirms BAR 6 points to the ROM, and the stubdom error does explain the amdgpu one)

On Debian I get:

[    1.244578] ACPI: Video Device [VGA] (multi-head: yes  rom: no  post: no)
...
[    2.370854] amdgpu 0000:07:00.0: amdgpu: Fetched VBIOS from VFCT

(whereas for the dGPU it does fetch the VBIOS from ROM BAR, claiming ACPI VFCT table present but broken. On Qubes I blacklisted it for now to avoid a boot loop, so no comparison here)

Not really sure why we have a difference here (there is anyway no explicit mention of a reason to expose the ROM in sysfs or not), and what the impact is. Probably useful to dig, as it could be the key to get (next) the dGPU to work…

Seems worth it :slight_smile:

Added this (to current 4.14.2, so the impacted file is in a different place, but that func looked identical enough), and then seeing no change added a LOGED() trace, which ends up in libxl-driver.log, showing it gets applied to domains 1, 3, 5 (5 not appearing once sys-gui-gpu is disabled I guess it’s the one). But well, pcifront does not seem to check if we did that before making the suggestion.

Following the lead of the behaviour difference from the Debian kernel with VBIOS fetching, I see that VFCT is skipped if it does not appear in ACPI table. Sure enough it is not reported by the 5.13.13 kernel in dom0 (in “standard” boot, not sys-gui-gpu) … but it did appear in a 5.10.47 log captured on Sep 11th (which was a “standard” boot too).
So I tried to boot that 5.10.47 kernel with sys-gui-gpu and “normal” modes, and sadly it gets no VFCT table either. Has there been a change in Xen since Sep 11th, which would account for that ? At first sight, vmm-xen last changed on Aug 25th, but then maybe it was still only in testing ? There were some AMD/IOMMU and ACPI related changes for XSA-378… I could try to revert those patches.

One more idea: see https://github.com/QubesOS/qubes-vmm-xen/blob/xen-4.14/patch-libxl-automatically-enable-gfx_passthru-if-IGD-is-as.patch. It enables “gfx_passthru” option for Intel graphics. It (among other things) grants stubdom access to 0xa0000-0xc0000 address ranges - see libxl__grant_vga_iomem_permission() function in libxl_pci.c file. Maybe a similar thing is needed for AMD too?
A quick and dirty way to test this hypothesis would be applying a patch like this:
https://gist.github.com/marmarek/3b65652bbfc58615d2b880643f24d93a
(totally untested, things may explode, don’t blame me for velociraptors attack)

This seems to be against mainline Xen, Qubes already has a patch that handles vbios at 0xc0000, so I felt that part was not needed and only applied the remaining 2. The libxl log does show gfx_passthru and gfx_passthru_kind set to igd, and the dm log still reports the address conflict.

Reading the libxl_dm code, it looks like it does nothing special from gfx_passthru, and with this patch we neutralize all it’s doing for igd. On qemu side I can’t find any special flag for non-igd GPU. I feel something missing.

Reading the gfx_passthru stuff in xl.cfg.5, I understand when a GPU can be passed through without it, it will be secondary in the VM - I’d thing that will not work as expected of sys-gui-gpu, so it would have to be set whatever the GPU vendor/model, right ?

Trying to step back a little, IIUC the core issue is the GPU driver in the guest not getting having access to the VBIOS ROM. It has several ways of accessing it, among which it prefers ACPI ATRM and VFCT tables (but then the guest gets a Xen-forged ACPI table), and reading the ROM BAR (which is thus our focus here).

Now qemu seems to have several ways of exposing the vbios through the ROM BAR. Notably, pci_assign_dev_load_option_rom() which is one of the focus of the IGD patch you mentionned goes reading it from /sys, and using the romfile= parameter seems to be an alternative, and since we’re talking about code in xen_pt_load_rom.c it may have chance to be useful even out of the KVM case.

It looks like we have several families of solutions, including:

  • full mmap-like passthrough, causing reads of the ROM BAR in the guest to result in reads ot the ROM BAR in dom0 (which is what we’re trying to do right now)
  • providing the ROM data to qemu, so it can emulate the ROM BAR (which is possibly simpler than using passthrough, and could provide a viable long-term solution), where we have 2 distinct problems:
    1. get the VBIOS ROM data, with 2 options:
      • get it from /sys in dom0 (which in my particular case could be linked to ACPI VFCT not being visible in dom0, which looks like a problem we could overcome, as the Qubes kernel was seeing this VFCT table a couple of weeks ago)
      • get it from a file
    2. make it available to qemu in stubdom
      • provide to qemu with romfile=
      • get qemu to see the rom in /sys, which I’m not sure would provide any advantage over romfile=

A low-hanging fruit, which would provide a fallback for the (apparently many) cases where reading the ROM requires more work, would be to pass romfile= from a file, and we could fix the more difficult problem from there.

Diving more in the qemu hw/xen code, the pci_assign_dev_load_option_rom() code will only run, through get_vgabios(), if igd-passthru has been set, so I’m trying without your patch commenting it out.

Unfortunately when not set, even though an error seems to be sent through QAPI, none gets to stderr and we can’t see this in the logs, so I’m adding a couple of XEN_PT_LOG calls there.

Iterating on this to assert what path in qemu is actually taken is quite painful though, with all stubdom being rebuilt on each make vmm-xen-stubdom-linux-dom0 (like everything gets rebuilt when asking make qubes-dom0). Isn’t there a simple “rebuild only changed stuff” feature in the builder ?

qubes-builder doesn’t give you this option, but to iterate quickly, you can easily clone https://github.com/qubesos/qubes-vmm-xen-stubdom-linux directly and build from there (see README).

Hm, time flies and I still did not find enough of it to do every tests I had in mind for this answer… so it may feel a bit incomplete…

By hacking the xen_pt_realize() test that checks for a hardcoded PFN for the IGD, preventing access to xen_pt_setup_vga() to anything not on 0000:00.02.0 (apparently compared with the PFN in the stubdom, where my iGPU is on 0000:00.00.0 - and I’m wondering why an IGP would get such special treatment that it would not appear as 0000:00.00.0 too in a stubdom), and I can see qemu (expectedly) failing to get the vbios from sysfs, and then happily copying it from memory, getting to the Legacy VBIOS registered trace from xen_pt_direct_vbios_copy().

I find that slightly disturbing, after the can't claim BAR 6 message - but then, it’s (surprisingly?) does not bother to check for any magic number (nor does the /sys/ code path, though in this case modern kernels do their own checks, IIRC).

As for the if(dev->romfile in pci_assign_dev_load_option_rom() I cannot see how it could result in the relevant pci_register_bar() call. So I went forward with hardcoding my video rom in the code for a test… and it turns out the amdgpu driver still prints the same Invalid PCI ROM data signature (with the same got 0xcb03aa55 which in memory spells starting with 0x55 0xaa … which happens to be the 2-byte magic for the BIOS ROM … which I find disturbing but could not make anything of it for now).

To make sure of what gets read in /dev/mem I added a check for the 0x55 0xaa magic number, and it indeed catches what appears not to be a bios rom, starting with 0x0000 - obviously I’ll have to double-check this, dump more memory, and see how this results in amgdpu finding out that signature.

Slow progress, and I again won’t have any time for this until next weekend :disappointed:

Well, the README does not tell about make full, whereas the images generated by make all do not appear to be used (at least the xen.xml template references the “full” version). Maybe this README would benefit from a bit more info ?

Also, building such packages separately, although it avoids full rebuild of everything, requires to install specific qubes devel packages, which ideally are only installed in a chroot to make sure they don’t pollute - or in a separate VM, but having separate temporary VMs to build each such package separately is starting to be heavy.
Maybe I’ll end up resuming my experiments with ISAR first :slight_smile: … sooo many nested projects and soo little time :frowning:

It is used by default. The reference to “full” version you’ve found is an alternative path (overriding the default) that is used only for very specific configs (with USB or audio passthrough via stubdom).