Hm, time flies and I still did not find enough of it to do every tests I had in mind for this answer… so it may feel a bit incomplete…
By hacking the xen_pt_realize() test that checks for a hardcoded PFN for the IGD, preventing access to xen_pt_setup_vga() to anything not on 0000:00.02.0(apparently compared with the PFN in the stubdom, where my iGPU is on 0000:00.00.0 - and I’m wondering why an IGP would get such special treatment that it would not appear as 0000:00.00.0 too in a stubdom), and I can see qemu (expectedly) failing to get the vbios from sysfs, and then happily copying it from memory, getting to the Legacy VBIOS registered trace from xen_pt_direct_vbios_copy().
I find that slightly disturbing, after the can't claim BAR 6 message - but then, it’s (surprisingly?) does not bother to check for any magic number (nor does the /sys/ code path, though in this case modern kernels do their own checks, IIRC).
As for the if(dev->romfile in pci_assign_dev_load_option_rom() I cannot see how it could result in the relevant pci_register_bar() call. So I went forward with hardcoding my video rom in the code for a test… and it turns out the amdgpu driver still prints the same Invalid PCI ROM data signature (with the same got 0xcb03aa55 which in memory spells starting with 0x55 0xaa … which happens to be the 2-byte magic for the BIOS ROM … which I find disturbing but could not make anything of it for now).
To make sure of what gets read in /dev/mem I added a check for the 0x55 0xaa magic number, and it indeed catches what appears not to be a bios rom, starting with 0x0000 - obviously I’ll have to double-check this, dump more memory, and see how this results in amgdpu finding out that signature.
Slow progress, and I again won’t have any time for this until next weekend
Well, the README does not tell about make full, whereas the images generated by make all do not appear to be used (at least the xen.xml template references the “full” version). Maybe this README would benefit from a bit more info ?
Also, building such packages separately, although it avoids full rebuild of everything, requires to install specific qubes devel packages, which ideally are only installed in a chroot to make sure they don’t pollute - or in a separate VM, but having separate temporary VMs to build each such package separately is starting to be heavy.
Maybe I’ll end up resuming my experiments with ISAR first … sooo many nested projects and soo little time
It is used by default. The reference to “full” version you’ve found is an alternative path (overriding the default) that is used only for very specific configs (with USB or audio passthrough via stubdom).
Finally I settled with disabling the build of the full version to cut build time in half… and enable parallel stubdom builds to divide it further by vcpus. 3 minutes to build started to make iterations reasonable again.
… but that does not seem to impress sys-gui-gpu's amdgpu driver, at all, it still claims:
[2021-11-14 16:47:47] [ 2.656523] amdgpu: Topology: Add CPU node
[2021-11-14 16:47:47] [ 2.656616] amdgpu 0000:00:05.0: vgaarb: deactivate vga console
[2021-11-14 16:47:47] [ 2.657625] [drm] initializing kernel modesetting (RENOIR 0x1002:0x1636 0x1462:0x12AC 0xC6).
[2021-11-14 16:47:47] [ 2.657651] amdgpu 0000:00:05.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[2021-11-14 16:47:47] [ 2.657678] [drm] register mmio base: 0xF1200000
[2021-11-14 16:47:47] [ 2.657688] [drm] register mmio size: 524288
[2021-11-14 16:47:47] [ 2.658964] [drm] add ip block number 0 <soc15_common>
[2021-11-14 16:47:47] [ 2.658977] [drm] add ip block number 1 <gmc_v9_0>
[2021-11-14 16:47:47] [ 2.658987] [drm] add ip block number 2 <vega10_ih>
[2021-11-14 16:47:47] [ 2.658998] [drm] add ip block number 3 <psp>
[2021-11-14 16:47:47] [ 2.659008] [drm] add ip block number 4 <smu>
[2021-11-14 16:47:47] [ 2.659018] [drm] add ip block number 5 <gfx_v9_0>
[2021-11-14 16:47:47] [ 2.659028] [drm] add ip block number 6 <sdma_v4_0>
[2021-11-14 16:47:47] [ 2.659039] [drm] add ip block number 7 <dm>
[2021-11-14 16:47:47] [ 2.659049] [drm] add ip block number 8 <vcn_v2_0>
[2021-11-14 16:47:47] [ 2.659059] [drm] add ip block number 9 <jpeg_v2_0>
[2021-11-14 16:47:47] [ 2.701134] [drm] BIOS signature incorrect 0 0
[2021-11-14 16:47:47] [ 2.701152] amdgpu 0000:00:05.0: Invalid PCI ROM data signature: expecting 0x52494350, got 0xcb03aa55
[2021-11-14 16:47:47] [ 2.742791] [drm] BIOS signature incorrect 0 0
[2021-11-14 16:47:47] [ 2.742881] [drm:amdgpu_get_bios [amdgpu]] *ERROR* Unable to locate a BIOS ROM
[2021-11-14 16:47:47] [ 2.742898] amdgpu 0000:00:05.0: amdgpu: Fatal error during GPU init
[2021-11-14 16:47:47] [ 2.742911] amdgpu 0000:00:05.0: amdgpu: amdgpu: finishing device.
… so it may well be that this ROM is still not provided to the VM where the driver is looking for it (I’m specifically double-checking that this 0x55 0xaa BIOS magic is there) @marmarek, will gladly accept more ideas at this point
As I’m having doubts (from Qubes 4.0.4 era) that the 5.4 default VM kernel would be able to properly support this hardware anyway, and since that really seems to be the most recent VM kernel around, I also tried to let sys-gui-gpu boot the fc33-provided 5.14 kernel (through qvm-prefs sys-gui-gpu kernel ""). In that case, the amdgpu driver does not even seem to be loaded, and sys-gui-gpu does not appear to start well enough for the Qubes agent to start, and it gets killed soon – the reason from kernel logs being lack of blkfront driver, obviously it cannot start this way without an enhanced initramfs.
Is there really no way to tell dracut not to omit any kernel hardware module ? I can’t believe it but no such thing apepars to be documented
Edit: I’ve started to doubt whether the fc33 ramdisk is indeed correctly generated at all, it should include the proper xen block drivers, right ? And a small step back allowed me to see it was kernel-latest-qubes-vm I was really looking for – though it does not help with the PCI ROM. Back to digging
It looks like all checks for e820_host in libxl_x86.c are in fact conditionned by b_info->type == LIBXL_DOMAIN_TYPE_PV, that would explain it has no impact on a HVM. According to the commit introducing e820_host that’s just how it is, “being a PV guest” is a prerequisite.
That said, it would seem that dom0 gets the host e820 map, and as I understand it that 0x000c0000-0x000dffff range does lie in the same reserved region (well, except if dom0 does not get the real BIOS e820 map – but hypervisor.log does not seem to dump it either, and this message during the series review seems to imply that dom0 indeed shows the host’s e820):
Thinking twice about it: I thought it would be expected to see the PCI devices physical addresses protected from the OS by being declared in reserved regions. However, if we compare the ranges of the different BARs:
… we can see that the 0xfe400000-0xfe47ffff range of BAR 5 is indeed intersecting with the 0x00000000fd000000-0x00000000ffffffff reserved region, but the BAR 0 and BAR 2 ranges fall in an “undeclared” gap between 2 reserved regions. And that difference does not result in different handling of those 3 BARs, whose resources are apparently all successfully claimed.
@marmarek, do you see why BAR 5 would not be detected as a conflict by request_resource_conflict() ? The most obvious would be that the stubdom’s memory map would not match the dom0 one (which I guess the host_e820 trick would correct if it was supported for HVM), but then the stubdom kernel is very quiet and does not report its view of the map. Its kconfig does not show a change in default loglevel, and its cmdline is reported as empty, there’s definitely no quiet flag there. How then is it so quiet ?
Back to the other end of the problem, namely getting to understand why amdgpu is still unable to access the expansion ROM exposed by the stubdom qemu…
Note that I’m starting to consider a new path, which would be to teach amdgpu to load a ROM directly from within the guest (eg. using request_firmware) as a way to advance the PoC, since obviously I’m not getting there as fast as I’d like with the current approaches. Before attempting this, though, there are still a few things that puzzle me and could possibly hint to something:
One thing that had been there before my eyes from the start but had not stood out until now:
I’m not really clear yet why we have those mismatches to start with, but whereas virtually all of them result in sync’ing the emulated addresses to the host’s, the first of those expansion-ROM-related ones is not, with 0xc0002 (in the range which is causing those headaches) becoming 0x0002.
A second thing is that qemu claims to expose the BARs for the device at addresses:
That is, the 0xaa55 expansion ROM signature is really there, but it’s the only value that looks right. At 0x16 where we should have the offset to the VBIOS signature, we get 0x0000, which explains the strange-looking signature causing the extraction to abort. I feel the biggest question here would be, why do we have the first 2 bytes correct, if the rest is just junk ?
I’m especially wondering if there would not be a link with my first question above, as the junk here starts at what should be physical address 0xc0002. Could it be that ROM shadowing gets broken because of this ?
So here I am with a small PoC commit doing precisely this. And indeed there is some good news: the driver does load the my VBIOS ROM and appears to like it, but soon things turn out not to be so fine (at first sight unrelated with VBIOS) with…
a strange-looking MTRR write failure
some trouble with the PSP firmware failing to load, triggering the termination of the amdgpu driver
… and then dereferencing a bad pointer (bug in the error path?) sends the kernel to panic, and possibly inducing a qemu segfault
… which result in unresponsive Qubes and requires hard poweroff
The kernel crash seems not too deep: the IPs are initialized in order, it is a PSP init failure that causes to stop and cleanup, the crash appears in vcn_v2_0_sw_fini dereferencing a fw_shared_cpu_addr pointer initialized during VCN init. When the fault occurs the pointer is non-NULL, could be a use-after-free ?
Quite some things to investigate and try next:
check if that bug still happens in 5.15/5.16rc; if still there use this occasion to play with KASAN – but it may not be that nuch of a blocker if I can…
… avoid use of PSP (move away _ta and _asd firmwares, or use module params ip_block_mask or fw_load_type)
check whether the suspect-looking points in former post have an impact here
renaming firmware files: they’re not optional, that causes early psp IP init failure (early enough that no vcn init/fini is run, thus no panic, but no help)
option amdgpu fw_load_type=1 in /etc/modprobe.d/ (supposed to force firmware load to go through smu instead of psp) seems to be ignored (and in fact the code shows it is ignored, only 0 can change anything)
option amdgpu ip_block_mask=0xfff7 to disable the PSP, OTOH, does have an impact: the psp is not initialized (though several components still claim they’ll use it) changes the error and proceeds into the kernel panic path:
[2021-11-25 23:30:22] [ 3.855687] [drm] sw_init of IP block <vega10_ih>...
[2021-11-25 23:30:22] [ 3.856832] [drm] sw_init of IP block <smu>...
[2021-11-25 23:30:22] [ 3.856864] [drm] sw_init of IP block <gfx_v9_0>...
[2021-11-25 23:30:22] [ 3.865352] [drm] sw_init of IP block <sdma_v4_0>...
[2021-11-25 23:30:22] [ 3.865439] [drm] sw_init of IP block <dm>...
[2021-11-25 23:30:22] [ 3.865880] [drm] Loading DMUB firmware via PSP: version=0x01010019
[2021-11-25 23:30:22] [ 3.865905] [drm] sw_init of IP block <vcn_v2_0>...
[2021-11-25 23:30:22] [ 3.868761] [drm] Found VCN firmware Version ENC: 1.14 DEC: 5 VEP: 0 Revision: 20
[2021-11-25 23:30:22] [ 3.868804] amdgpu 0000:00:05.0: amdgpu: Will use PSP to load VCN firmware
[2021-11-25 23:30:22] [ 3.936773] [drm] sw_init of IP block <jpeg_v2_0>...
[2021-11-25 23:30:22] [ 3.940481] amdgpu 0000:00:05.0: amdgpu: SMU is initialized successfully!
[2021-11-25 23:30:22] [ 3.943960] [drm] kiq ring mec 2 pipe 1 q 0
[2021-11-25 23:30:22] [ 4.106258] input: dom0: Power Button as /devices/virtual/input/input7
[2021-11-25 23:30:22] [ 4.109534] input: dom0: Power Button as /devices/virtual/input/input8
[2021-11-25 23:30:22] [ 4.109748] input: dom0: Video Bus as /devices/virtual/input/input9
[2021-11-25 23:30:22] [ 4.109877] input: dom0: AT Translated Set 2 keyboard as /devices/virtual/input/input10
[2021-11-25 23:30:22] [ 4.110764] input: dom0: ELAN2203:00 04F3:30AA Mouse as /devices/virtual/input/input11
[2021-11-25 23:30:22] [ 4.131382] amdgpu 0000:00:05.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[2021-11-25 23:30:22] [ 4.131566] [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
[2021-11-25 23:30:22] [ 4.131761] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v9_0> failed -110
[2021-11-25 23:30:22] [ 4.131953] amdgpu 0000:00:05.0: amdgpu: amdgpu_device_ip_init failed
[2021-11-25 23:30:22] [ 4.131968] amdgpu 0000:00:05.0: amdgpu: Fatal error during GPU init
[2021-11-25 23:30:22] [ 4.145153] input: dom0: ETPS/2 Elantech Touchpad as /devices/virtual/input/input12
[2021-11-25 23:30:22] [ 4.149031] input: dom0: ELAN2203:00 04F3:30AA Touchpad as /devices/virtual/input/input13
[2021-11-25 23:30:22] [ 4.160053] input: dom0: Sleep Button as /devices/virtual/input/input14
[2021-11-25 23:30:22] [ 4.243266] amdgpu 0000:00:05.0: amdgpu: amdgpu: finishing device.
[2021-11-25 23:30:22] [ 4.256416] amdgpu: probe of 0000:00:05.0 failed with error -110
[2021-11-25 23:30:22] [ 4.256443] [drm] sw_fini of IP block <jpeg_v2_0>...
[2021-11-25 23:30:22] [ 4.256466] [drm] sw_fini of IP block <vcn_v2_0>...
[2021-11-25 23:30:22] [ 4.256482] BUG: unable to handle page fault for address: ffffbaa420cdf000
Since the kernel panic (which induces a qemu crash and forces me to powerdown) is linked to VCN, let’s check what happens when we disable this non-essential IP (and the equally non-essential jpeg one while I’m at it), with amdgpu.ip_block_mask=0xff. More IPs get finalized, and we then hit a new one:
Progress has been slow, and happening mostly on an amd-gfx thread. Only today did I see the guest amdgpu driver start up for the first time - although this is a big step, but there are still a couple of glitches getting in the way of video output.
With a bit of luck, Santa may be only slightly late with this christmas present
Damn this post and the linked/related ones are a great way to understand how things work under the hood ! ^^
Just a noob remark, have you tried by blacklisting amdgpu in dom0 and assigning the device to xen-pciback ? I read nowhere that you tried it.
This would prevent dom0 and/or the driver from doing nasty things with your GPU before PT-ing !
Below is my working method for a RX580, maybe that works for you too ?
Some notes before
I know the RX580 is not a iGPU, and I’m using it in a Ryzen desktop CPU (Ryzen 1700X), and there are many things I don’t know, but this method may be of help to others
the RX580 card has no FLR, is on the primary x16 PCI slot, so it’s used for displaying BIOS POST and early kernel messages, then xen-pciback seizes it, and the display switches to my other GPU, fortunately an Nvidia (so no driver conflict).
those instructions are for a Debian-based dom0, please carefully adapt. I just started Qubes, so I don’t know the correct paths and don’t wanna say 5h!t ! ^^
the RX580 must NEVER leave the pci-assignable pool, or hell will fall on you.
1. Modules config
First ensure that /etc/modules or modprobe.d/ contains this
(PS: it’s already done on Qubes, in /etc/sysconfig/modules/qubes-dom0.modules)
In /etc/modprobe.d/atigpu-blacklist.conf (for Qubes /etc/sysconfig/modules/atigpu-blacklist.conf seems the right place)
As you also have an AMD dGPU, I think you need an extra step to reload the driver once the domU containing the iGPU is started, but I’ve not tested it : my setup uses a Nvidia GPU for dom0, so it’s easier.
2. initramfs config
Create a new script like /usr/share/initramfs-tools/scripts/init-top/zload_xen-pciback, and don’t forget to chmod +x zload_xen-pciback, it’s a sh script.
PS: no idea where this script should be in Qubes !
I couldn’t find a lot of info about pci-stub, do you have docs/pointers please ? The Xen wiki recommends using pciback as it works for PV and HVM, so I’ve always thought it was more up-to-date.
And sorry to insist, but have you tried the initramfs+udev method above ? I’ve found it working better for my (dedicated) AMD GPU (and other devices loaded early like additional SATA controllers).
From what I understand, usual blacklisting/modprobing happens too late for some devices, as xen-pciback/pci-stub is only loaded after the device module is loaded.
Btw, thanks for sharing your tests, don’t stop, I learn a lot even if not understanding most parts ^^
After thinking about it some more, I was wondering if it might have something to do with the fact that the passthrough adapter is hidden with both xen-pciback and rd.qubes, even though it is the primary boot adapter. Qubes does end up using the secondary adapter I have in the system, but the log still says the efifb is assigned to the primary card (despite also having video=efifb:off on the kernel cmdline).
I am getting all the same messages about the adapter that appear in this issue. The Video Shadow copy of the ROM is created in the address range as well.
If there is no news, would it be something that could be kicked up to the xen developers?