NVIDIA GPU passthrough into Linux HVMs for CUDA applications

deeplow · January 22, 2024, 6:31pm

The symptoms look similar to what many of us are experiencing in

Please join us on that topic as it seems that the issue arises in a part that’s not specific to this guide.

Kykyx · April 27, 2024, 11:16pm

Can you please clarify some steps, so finally you have autoconfigured X or do that manually?
Is there any solution to autoconfig X with my custom configuration of gpus and monitors?

I have installed drivers and Cuda (nvidia-smi works fine), but it looks like now I should correctly configure Xorg. Can someone show me the way I should correctly add\modify 2nd device\screen\monitor\server in Xorg\X? Can I just add a second device, a screen, and a monitor? If yes, where can I find the parameters of my second monitor?

in my gpu-linux vm’s lspci:
00:03.0 VGA compatible controller: Device 1234:1111 (rev 02)
kernel-driver in use: bochs-drm
kernel-modules: bochs
00:06.0 VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3-70 Ti] (rev a1)
kernel-driver in use: nvidia
kernel-modules: nouveau, nvidia-drm, nvidia

1st one is virtual VGA device from host system?

xrandr -q
Screen 0: minimum 64 x 64, current 1920 x 1200, maximum 32767 x 32767
DUMMY0 connected primary 1920x1200+0+0 0mm x 0mm
QB1920x1200 41.50*+
DUMMY1 disconnected
…
Do I have 2 screens here or it is the same one? I can’t see anything on monitor plugged in gpu passthrough into vm.

my /etc/X11/xorg.conf.d/10-nvidia.conf:

Section “OutputClass”
Identifier “nvidia”
MatchDriver “nvidia-drm”
Driver “nvidia”
Option “AllowEmptyInitialConfiguration”
Option “PrimaryGPU” “yes”
Option “SLI” “auto”
Option “BaseMosaic” “on”
EndSection

Section “OutputClass”
Identifier “intel”
MatchDriver “i915”
Driver “modesetting”
EndSection

my /etc/X11/xorg-qubes.conf
Section “Module”
Load “fb”
Load “glamoragl”
EndSection

Section “ServerLayout”
Identifier “Default Layout”
Screen 0 “Screen0” 0 0
InputDevice “qubesdev”
EndSection

Section “Device”
Identifier “Videocard0”
Driver “dummyqbs”
VideoRam 17101
Option “GUIDomID” “0”
EndSection

Section “Monitor”
Identifier “Monitor0”
horizSync 49-50
VertRefresh 41-42
Modeline “QB1920x1200” 96 1920 1921 …
EndSection

Section “Screen”
Identifier “Screen0”
Device “Videocard0”
Monitor “Monitor0”
DefaultDepth 24
Subsection “Display”
Viewport 0 0
Depth 24
Modes “QB1920x1200”
EndSubSection
EndSection

Section “InputDevice”
Identifier “qubesdev”
Driver “qubes”
EndSection

Fahrstuhl · April 28, 2024, 11:06am

In the end I switched to an Arch Linux VM built using Qubes builder and containing the Qubes windowing system. Fedora was annoying to get the driver installed and the drivers in Debian are very old.

I use a custom Xorg config named dedicated_gpu_X.conf:

Section "Device"
	Driver "nvidia"
	Identifier "Nvidia"
	BusID "PCI:0:8:0"
EndSection

and start a second X server with DISPLAY=:1 startx -- -config dedicated_gpu.conf

I need to check every boot that the PCI address of the GPU in lspci still matches the BusID in the X config.

Kykyx · April 28, 2024, 9:17pm

Can you please provide guide for Arch VM (or may be just quick steps what should be done)? Looks like 3 weeks passed since I’ve started these guides and still got semi-result because of xorg setup not easy for me right now.

Fahrstuhl · April 29, 2024, 1:26pm

For Qubes 4.1 I think I followed this guide: 'archlinux-minimal' template
For Qubes 4.2 I didn’t yet manage to get qubes-builder v2 working but I found there is an archlinux template in the qubes-templates-community-testing repositories as shown by qvm-template list in dom0.

As for installing the nvidia driver, I only installed the nvidia-open-dkms package, I don’t think I did any further configuration.

arkfox · November 2, 2024, 8:48pm

I followed this guide and it was working flawlessly for awhile and achieving high FPS. I’m not sure if an update caused it or what, but I only get about 10 fps in games now. I confirmed that the drivers are loaded with lspci -v and confirmed that the games are using the GPU with nvidia-smi.

Not sure if anyone has addressed this but there is also an issue with this setup (at least on my nvidia gpu) where the VM takes ~5-10 minutes to shutdown and return the PCI device back to dom0.

Edit: Might be related to Vulkan issues with nvidia. Running vkcube returns Selected GPU 0: llvmpipe (LLVM 18.1.8, 256 bits), type: Cpu.