Virtualised Intel GPU with SR-IOV

I managed to make SR-IOV work in a Framework 12 with the Intel i5-1334U chipset (Raptor Lake).
I installed the strongtz driver in dom0 and in the template, and it worked well (compiling it against the dom0-provided kernel, without installing a kernel in the VM). It seems that regenerating the initramfs should make the driver available to AppVMs, but when I tried I got mixed results and I did not go deeper.

Things also seem to work even if I add no kernel options to the VM, it looks like the driver automatically detects it is working with a vGPU.

Finally, I tried enabling memory balancing and it seems to not crash. However, I do not know how that affects the amount of RAM used for the GPU.

These were also the steps I took. Installing in dom0 and then doing the same in the template. Curious if @spike-punk will figure out why in his case it was not necessary to install the module in the vm template. This would simplify the maintenance, as currently you need to either “hardset” the kernel in AppVMs to not get overwritten on dom0 updates, or to recompile the template module whenever dom0 updates.

Today I faced a freeze on my laptop. The mouse was still moving but the screen was stuck in the last view after a wakeup. After restarting the machine the vgpus were not available any more. I realized that sriov_numvfs was set to “0” again. After writing “7” to it the vgpus were back again. This was weird as I am not sure which process has reset this setting…Anybody faced a similar issue?

Since the Driver is added to the Kernel via DKMS it should automaticaly recompile. So no need to worry about that.

Since it’s not a mainlined driver some issues are to be expected. Also, I would expect that mouch like with Xen we (as in Qubes users) are pretty much the only ones using this on mobile devices that go to suspend.
But for me this issue never occured. It sounds like the Driver crashed, but you said you could move your Mouse, which makes this strange. It might be usefull to take a look at the dmesg output after it happend. Would be nice if you could post that. Of course check if there is any information included, that you want to stay private.

for dom0, yes, but for the VM? Assuming that the kernel is not actually installed in the VM itself, I doubt that automatic recompiling takes place. But maybe I am missing something here.

Unsure I will be able to get the dmesg information after a reboot, but I will surely try the next time it happens.

No, I was missing something. For some reason qubes-kernel-vm-support is installed on my system. (Might be that I installed it long ago and forgot). This Package is responsible for using your dom0 Kernels in VMs. The interresting part is, that i didn’t run it to create a new Kernel for the VMs. Also the VM Kernel Module files in dom0 are older than in the VM, despite that they should be the same.
I think what happens is, that because of this package the modules are loaded directly from dom0 instead from the vm-kernels folder. But this is only a guess.

I do have this package also installed in dom0, still for me the “shared” kernel was not working. Not sure what needs to be done to make it work but it does not work automatically.

Just one more data point: The kernel in dom0 got udpated, and SR-IOV stopped working (with some scary-looking messages in the VMs). Turns out that the DKMS module had not been installed in the template for the new kernel, and that was causing the issues.
Regenerating the initramfs with dracut after the update did not get the right drivers in the VMs.

Now one question: I have noticed that the GPU in the VMs shows 4 GB of video RAM. Is that something that can be changed?

Just a note that I had trouble with this with Intel Arc B50. For one, it isn’t necessary to use the DKMS, linux 6.19 includes the necessary drivers for SR-IOV. Although I suspect at xen 4.19 does not support the Intel B50 very well, as unbinding xe from the created virtual function crashes the kernel. Since starting a VM unbinds the device a rebinds to vfio-pci, the kernel just keeps crashing.

A workaround is to disable automatic probing of virtual functions by xe, via echo 0 | sudo tee /sys/devices/pci000:00/*/*/*/*/sriov_drivers_autoprobe or by creating the following udev rule:

ACTION="add", SUBSYSTEM="pci", ATTR{sriov_totalvfs}=="*", ATTR{sriov_drivers_autoprobe}="0"

(the above rule basically disables autoprobe on every sriov device)

For iGPUs at least its unified memory, that means that if you assign more memory to your VM the GPU also has more Memory accessible.

The DKMS Drivers are based on the drivers directly from intel. My understanding was that some part did become upstream, but not all and that it wasn’t available for normal users. So your GPU is only one of the lucy ones where it works out of the box. Most GPUs that could use SR-IOV would only work with the DKMS driver.

As for the rebinding issue: Qubes sould automaticaly assign the pciback driver to the vGPUs, as they appear after booting. Obviously the Kernel (and X.org) aren’t happy if a GPU suddenly vanishes. With the rd.qubes.hide_pci Kernel Boot parameter you can “hide” the vGPUs from dom0 so they won’t be assigned the Xe driver.

Unfortunately, xe autobinds to the created virtual functions, at least when the physical function is used by the host. The rd.qubes.hide_pci kernel argument only works if you do not intend to use the physical function by the host, as the virtual functions have not been created yet.

you can specify the PCI adresses of the virtual functions. That works even if they are not persent at boot time.

In my experience, if the PCI address does not exist, the boot fails and the computer restarts. That said, I shared the above information to help others use the Intel Arc Pro B50 with QubesOS, when it is also used as the the host’s GPU.

Edit: I also wrote a guide detailing this my process: Virtualized hardware acceleration on QubesOS using Intel Arc B50 - Ayakael

1 Like

I can confirm, with the NC NV56, skipping the DKMS driver part does not work (writing to sriov_numvfs returns a write error if you specify any number other than 0.

It is true, it has unified memory, so I assume the formally assigned memory to the iGPU is not critical. It would seem that there are “steps”. I assigned 24 GB to the VM, and then the iGPU says it has ~11GB VRAM. When I had assigned 16 GB it said 4 GB. So I guess that number scales with available RAM in the VM.

That is a little bit strange. I have 8GB Ram assigned to the VM and the GPU reports 8GB Ram available, so the steps do not seam to make sense. Do you have memory balancing on? At least for me when I enable it it only uses the minimal amount of memory.