That must be an ASIC-specific issue then no such issue with the RENOIR. However, I still have my NAVI14 dGPU (RX 5500 M) disabled because of a boot loop too.
Since the kernel panic (which induces a qemu crash and forces me to powerdown) is linked to VCN, let’s check what happens when we disable this non-essential IP (and the equally non-essential jpeg one while I’m at it), with amdgpu.ip_block_mask=0xff. More IPs get finalized, and we then hit a new one:
This one seems to talk about a GPU-memory management issue. Guess I’ll stop here chasing those downstream crashes, at least this one doesn’t crash qemu and spares me some reboots.
Progress has been slow, and happening mostly on an amd-gfx thread. Only today did I see the guest amdgpu driver start up for the first time - although this is a big step, but there are still a couple of glitches getting in the way of video output.
With a bit of luck, Santa may be only slightly late with this christmas present
Damn this post and the linked/related ones are a great way to understand how things work under the hood ! ^^
Just a noob remark, have you tried by blacklisting amdgpu in dom0 and assigning the device to xen-pciback ? I read nowhere that you tried it.
This would prevent dom0 and/or the driver from doing nasty things with your GPU before PT-ing !
Below is my working method for a RX580, maybe that works for you too ?
Some notes before
I know the RX580 is not a iGPU, and I’m using it in a Ryzen desktop CPU (Ryzen 1700X), and there are many things I don’t know, but this method may be of help to others
the RX580 card has no FLR, is on the primary x16 PCI slot, so it’s used for displaying BIOS POST and early kernel messages, then xen-pciback seizes it, and the display switches to my other GPU, fortunately an Nvidia (so no driver conflict).
those instructions are for a Debian-based dom0, please carefully adapt. I just started Qubes, so I don’t know the correct paths and don’t wanna say 5h!t ! ^^
the RX580 must NEVER leave the pci-assignable pool, or hell will fall on you.
Steps
1. Modules config
First ensure that /etc/modules or modprobe.d/ contains this
(PS: it’s already done on Qubes, in /etc/sysconfig/modules/qubes-dom0.modules)
xen-pciback
In /etc/modprobe.d/atigpu-blacklist.conf (for Qubes /etc/sysconfig/modules/atigpu-blacklist.conf seems the right place)
blacklist amdgpu
As you also have an AMD dGPU, I think you need an extra step to reload the driver once the domU containing the iGPU is started, but I’ve not tested it : my setup uses a Nvidia GPU for dom0, so it’s easier.
2. initramfs config
Create a new script like /usr/share/initramfs-tools/scripts/init-top/zload_xen-pciback, and don’t forget to chmod +x zload_xen-pciback, it’s a sh script.
PS: no idea where this script should be in Qubes !
In /usr/share/initramfs-tools/scripts/init-top/udev
PS: no idea where this script should be in Qubes !
# change
PREREQS=""
# to
PREREQS="zload_xen-pciback"
Last thing, don’t forget to regenerate your initramfs (this too I dunno how to do on Qubes/Fedora).
To correctly adapt the paths to Qubes, read the “credit link” below. In short, in Debian, initramfs scripts in /usr take precedence over initramfs scripts in /etc.
3. End credits ^^
Voilà, I hope it works for you !
For more detailed explanations of how and why it works, and the credit for inspiration, check this link.
I couldn’t find a lot of info about pci-stub, do you have docs/pointers please ? The Xen wiki recommends using pciback as it works for PV and HVM, so I’ve always thought it was more up-to-date.
And sorry to insist, but have you tried the initramfs+udev method above ? I’ve found it working better for my (dedicated) AMD GPU (and other devices loaded early like additional SATA controllers).
From what I understand, usual blacklisting/modprobing happens too late for some devices, as xen-pciback/pci-stub is only loaded after the device module is loaded.
Btw, thanks for sharing your tests, don’t stop, I learn a lot even if not understanding most parts ^^
Just wanted to post here to say I am having a very similar issue to what is being discussed here, and I wanted to find out what the status of this issue is.
After thinking about it some more, I was wondering if it might have something to do with the fact that the passthrough adapter is hidden with both xen-pciback and rd.qubes, even though it is the primary boot adapter. Qubes does end up using the secondary adapter I have in the system, but the log still says the efifb is assigned to the primary card (despite also having video=efifb:off on the kernel cmdline).
I am getting all the same messages about the adapter that appear in this issue. The Video Shadow copy of the ROM is created in the address range as well.
If there is no news, would it be something that could be kicked up to the xen developers?