AMD iGPU passthrough attempt

yann · November 15, 2021, 8:58pm

As a first PoC I started with compiling my extracted ROM as static data…

… but to get it loaded at all I also had to revert this patch hunk which assumes that previous code creates a proper shadow copy, which is probably not the case here (or is it ?).

Now my stubtom seems to expose a VGA device with rombar, showing…

[2021-11-14 16:47:43] [00:05.0] xen_pt_realize: Assigning real physical device 07:00.0 to devfn 0x28
[2021-11-14 16:47:43] [00:05.0] xen_pt_realize:  real_device = 0000:07:00.0
[2021-11-14 16:47:43] [00:05.0] xen_pt_realize: Assigning VGA (passthru=1)...
[2021-11-14 16:47:43] [00:05.0] xen_pt_setup_vga: Legacy VBIOS imported
[2021-11-14 16:47:43] [00:05.0] xen_pt_register_regions: IO region 0 registered (size=0x10000000 base_addr=0xb0000000 type: 0x4)
[2021-11-14 16:47:43] [00:05.0] xen_pt_register_regions: IO region 2 registered (size=0x00200000 base_addr=0xc0000000 type: 0x4)
[2021-11-14 16:47:43] [00:05.0] xen_pt_register_regions: IO region 4 registered (size=0x00000100 base_addr=0x0000e000 type: 0x1)
[2021-11-14 16:47:43] [00:05.0] xen_pt_register_regions: IO region 5 registered (size=0x00080000 base_addr=0xfe400000 type: 0)
[2021-11-14 16:47:43] [00:05.0] xen_pt_register_regions: Expansion ROM registered (size=0x00020000 base_addr=0x000c0000)
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0010 mismatch! Emulated=0x0000, host=0xb000000c, syncing to 0xb000000c.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0018 mismatch! Emulated=0x0000, host=0xc000000c, syncing to 0xc000000c.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0020 mismatch! Emulated=0x0000, host=0xe001, syncing to 0xe001.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0024 mismatch! Emulated=0x0000, host=0xfe400000, syncing to 0xfe400000.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0030 mismatch! Emulated=0x0000, host=0xc0002, syncing to 0x0002.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0052 mismatch! Emulated=0x0000, host=0x0003, syncing to 0x0003.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x00a2 mismatch! Emulated=0x0000, host=0x0084, syncing to 0x0080.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0068 mismatch! Emulated=0x0000, host=0x8fa1, syncing to 0x8fa1.
[2021-11-14 16:47:43] [00:05.0] xen_pt_config_reg_init: Offset 0x0076 mismatch! Emulated=0x0000, host=0x1104, syncing to 0x1104.
[2021-11-14 16:47:43] [00:05.0] xen_pt_pci_intx: intx=1
[2021-11-14 16:47:43] [00:05.0] xen_pt_realize: Real physical device 07:00.0 registered successfully

… but that does not seem to impress sys-gui-gpu’s amdgpu driver, at all, it still claims:

[2021-11-14 16:47:47] [    2.656523] amdgpu: Topology: Add CPU node
[2021-11-14 16:47:47] [    2.656616] amdgpu 0000:00:05.0: vgaarb: deactivate vga console
[2021-11-14 16:47:47] [    2.657625] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x1462:0x12AC 0xC6).
[2021-11-14 16:47:47] [    2.657651] amdgpu 0000:00:05.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[2021-11-14 16:47:47] [    2.657678] [drm] register mmio base: 0xF1200000
[2021-11-14 16:47:47] [    2.657688] [drm] register mmio size: 524288
[2021-11-14 16:47:47] [    2.658964] [drm] add ip block number 0 <soc15_common>
[2021-11-14 16:47:47] [    2.658977] [drm] add ip block number 1 <gmc_v9_0>
[2021-11-14 16:47:47] [    2.658987] [drm] add ip block number 2 <vega10_ih>
[2021-11-14 16:47:47] [    2.658998] [drm] add ip block number 3 <psp>
[2021-11-14 16:47:47] [    2.659008] [drm] add ip block number 4 <smu>
[2021-11-14 16:47:47] [    2.659018] [drm] add ip block number 5 <gfx_v9_0>
[2021-11-14 16:47:47] [    2.659028] [drm] add ip block number 6 <sdma_v4_0>
[2021-11-14 16:47:47] [    2.659039] [drm] add ip block number 7 <dm>
[2021-11-14 16:47:47] [    2.659049] [drm] add ip block number 8 <vcn_v2_0>
[2021-11-14 16:47:47] [    2.659059] [drm] add ip block number 9 <jpeg_v2_0>
[2021-11-14 16:47:47] [    2.701134] [drm] BIOS signature incorrect 0 0
[2021-11-14 16:47:47] [    2.701152] amdgpu 0000:00:05.0: Invalid PCI ROM data signature: expecting 0x52494350, got 0xcb03aa55
[2021-11-14 16:47:47] [    2.742791] [drm] BIOS signature incorrect 0 0
[2021-11-14 16:47:47] [    2.742881] [drm:amdgpu_get_bios [amdgpu]] *ERROR* Unable to locate a BIOS ROM
[2021-11-14 16:47:47] [    2.742898] amdgpu 0000:00:05.0: amdgpu: Fatal error during GPU init
[2021-11-14 16:47:47] [    2.742911] amdgpu 0000:00:05.0: amdgpu: amdgpu: finishing device.

… so it may well be that this ROM is still not provided to the VM where the driver is looking for it (I’m specifically double-checking that this 0x55 0xaa BIOS magic is there)
@marmarek, will gladly accept more ideas at this point

As I’m having doubts (from Qubes 4.0.4 era) that the 5.4 default VM kernel would be able to properly support this hardware anyway, and since that really seems to be the most recent VM kernel around, I also tried to let sys-gui-gpu boot the fc33-provided 5.14 kernel (through qvm-prefs sys-gui-gpu kernel ""). In that case, the amdgpu driver does not even seem to be loaded, and sys-gui-gpu does not appear to start well enough for the Qubes agent to start, and it gets killed soon – the reason from kernel logs being lack of blkfront driver, obviously it cannot start this way without an enhanced initramfs.
Is there really no way to tell dracut not to omit any kernel hardware module ? I can’t believe it but no such thing apepars to be documented

For reference:

Edit: I’ve started to doubt whether the fc33 ramdisk is indeed correctly generated at all, it should include the proper xen block drivers, right ? And a small step back allowed me to see it was kernel-latest-qubes-vm I was really looking for – though it does not help with the PCI ROM. Back to digging