Step-by-step nvidia GPU passthrough for cuda/vulkan compute applications

tunt · October 20, 2025, 12:43am

Hey qubesters. I recently acquired a new rig with a powerful nvidia GPU, and spent the last few days setting it up. I wanted to this GPU attached to a qube via PCI passthrough, so that I could run local AI programs like Stable Diffusion and llama.cpp. This guide distills a few days of tinkering into a short and sweet step-by-step process which worked for my machine. The hardest part was figuring out how to install the right drivers.

The end result of this guide is a sys-gpu VM which can run Cuda or Vulkan compute applications. Possibly graphics applications too, but I’m mostly interested in compute, so I didn’t thoroughly test 3d rendering or anything like that.

disclaimer: I’ve heard that GPU passthrough is a cantankerous beast to set up. On my hardware it was relatively simple to get going once my Qubes installation was functional, but your mileage may vary. Other guides abound.

Prerequisites

You need two GPUs for this to work.

One will be used by dom0 to render your qubes desktop environment. The other will be attached to your GPU compute VM. Most CPUs have a built-in integrated-GPU (iGPU) baked in, but some (certain AMD chips) do not. If you’re shopping for a new computer, make sure to double-check the CPU has “integrated graphics” or an “integrated GPU” built-in.

You also need an nvidia graphics card. Mine is an RTX 5060 Ti. I’m sure there are similar steps for AMD cards, but i don’t have one so I can’t speak to that.

Steps

At a high level, the procedure is:

Set up Qubes OS so that the discrete GPU can be safely attached to a qube.
Create and configure a sys-gpu VM
Start the sys-gpu VM
Install nvidia drivers
Restart sys-gpu
Test cuda/vulkan compute applications

Step 0: Close Everything and BACK UP

You’re about to do some dangerous stuff that may mess with your qubes OS installation. Make sure to save your work, shut down your qubes, and back them up before proceeding.

Step 1: Set Up Qubes OS

If your qubes OS machine has two GPUs, then qubes only uses one of them to render the graphical desktop environment that you are probably reading this page on right now. The other GPU is left unused.

Running a desktop GUI is pretty light work as far as graphics cards are concerned, so it is completely reasonable to dedicate your iGPU to rendering Qubes OS and dom0, while keeping your larger nvidia dGPU in reserve (so we can use it inside a qube later).

To do this, we need to force dom0 to use a specific one of your GPUs.

For me, this was very easy: On my machine, the iGPU and nvidia GPU each have their own separate sets of HDMI/DisplayPort outputs. When I boot the computer, the OS detects which one has a cable connected, and dom0 uses that GPU only. So for me, it was as simple as plugging the HDMI cord into my iGPU and NOT into the nvidia GPU. If this is the case for you, turn off your PC, plug the HDMI cord into the iGPU, and proceed to step 2.

If your machine has only a single video output port for both GPUs, then you’ll need to tell the linux kernel to hide the nvidia GPU from dom0, thus forcing dom0 to use the iGPU to render Qubes OS.

How to manually hide your dGPU from dom0

Open a new dom0 shell
Find your dGPU’s PCI identifier string:

lspci | grep VGA | grep -i nvidia

Example Output:

01:00.0 VGA compatible controller: NVIDIA Corporation Device

The PCI identifier is the first string, in this case 01:00.0 is the PCI device ID.

gpu_pci_id="01:00.0"

Edit /etc/default/grub in dom0 and add an rd.qubes.hide_pci flag to your linux kernel command line parameters:

GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX rd.qubes.hide_pci=01:00.0"

Regenerate the grub configuration file:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Here is a one-liner which does all the above:

echo "GRUB_CMDLINE_LINUX=\"\$GRUB_CMDLINE_LINUX rd.qubes.hide_pci=$(lspci | grep VGA | grep -i nvidia | awk '{print $1}')\"" | 
  sudo tee -a /etc/default/grub &&
  sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Now you should reboot your computer. You may encounter booting issues here if you did this wrong. If qubes OS doesn’t boot cleanly, you’ll need to edit your kernel’s command line parameters at boot to remove the rd.qubes.hide_pci=<id> flag.

Restart your computer, and wait until you see the a screen offering choices like “Qubes with Xen Hypervisor”, and “Advanced options for Qubes”. Press the e key. Now you are editing the kernel command line parameters ephemerally - any changes you make will only affect this boot attempt. Remove rd.qubes.hide_pci=<id>. Now press Ctrl+x to boot. Qubes should start cleanly now. You should go back and figure out what exactly you did wrong in the PCI-hiding procedure - maybe you used the wrong PCI ID by mistake.

To revert your changes and get your system working normally again, just remove that new line from /etc/default/grub and run grub2-mkconfig again.

Step 2: Create a `sys-gpu` qube

Create a new Standalone qube sys-gpu. You can use Debian, Fedora, or whatever you like. Configure it as follows:

Initial memory: At least 2000 MiB
Include in memory balancing: NO (uncheck the box)
Mode: HVM
Devices: Attach your nvidia GPU device

BEFORE YOU START THE QUBE, make sure you have completed step 0: close all other qubes and save your work. If you made any mistakes during the previous stage, dom0 may still be using the nvidia GPU. If this is the case, then launching sys-gpu will detach the graphics card from dom0 and crash your screen. You will need to force reboot your computer if this happens and return to step 1 (hide your Nvidia GPU from dom0).

Step 3: Start `sys-gpu`

If you can start sys-gpu without crashing your screen, congrats! You’re done with the hard and dangerous part. Now comes the annoying part.

Step 4: Install Nvidia Drivers

This part was the hardest for me, as there were many conflicting resources, different guides suggesting different methods. I tried most of them but only one worked.

Among the many guides I found online for setting up Nvidia drivers on Linux, their methods all fell into three buckets:

Install drivers using a runfile pulled from Nvidia’s website
Install drivers from your template OS’s default package manager repositories (e.g. Debian’s non-free repo)
Install drivers using Nvidia’s package manager repositories

In my case, I was using a Debian 12 template. I tried all three, but only Nvidia’s repository worked for my graphics card. Specifically, it was this documentation from Nvidia that I followed:

The TLDR for Debian to install the open-source drivers:

curl -L -o /tmp/cuda-repo.deb \
    https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i /tmp/cuda-repo.deb && rm /tmp/cuda-repo.deb
sudo apt update
sudo apt install -y nvidia-open

Depending on your hardware, you may find the other two methods work for you. The debian non-free repo’s drivers were too obsolete for my new card - Check out the nvidia-detect package in the non-free repo to see if the old drivers work for your card. If they do, maybe you could get by without 3rd party repositories or runfiles. Just do add-apt-repository non-free contrib and apt install nvidia-driver. You may need the software-properties-common package to get the add-apt-repository command.

The Nvidia runfile seemed to work at first, but vulkaninfo wouldn’t recognize the GPU and nvidia-smi reported no devices. IDK what happened there.

Using the nvidia repository on the other hand worked seamlessly for my GPU. It also gives easy access to download the cuda toolkit if you please, with a simple apt install cuda, so this is my recommendation.

Step 5: Restart `sys-gpu`

This is an optional step - I didn’t need to in my case - but I’ve seen many resources say that to fully load the new nvidia kernel module, the machine (or qube in this case) needs to be restarted.

In some cases, you may have to edit the qube’s settings to: Kernel: “provided by qube” or else the qube won’t boot with the new kernel module. For me, this only happened to me when using the Nvidia runfile installer. The drivers from the nvidia debian repo worked fine without any other qvm-prefs changes.

Step 6: Test Cuda/Vulkan Apps

At this stage, your sys-gpu qube should be ready for prime time. If you have any cuda or vulkan applications, now is the moment of truth to test them out.

An easy and quick test to check Vulkan is working: Use the vulkaninfo tool to see if libvulkan can detect your GPU.

sudo apt install vulkan-tools -y &&
  vulkaninfo --summary |
  grep -A9999 Devices

This should print out a summary of available vulkan devices. One may be your CPU, but you should definitely see your Nvidia GPU in there too now. If not, you may need to retry the driver installation step and troubleshoot from there. Make sure your nvidia GPU is actually attached and visible inside sys-gpu. To check with lspci:

sudo apt install pciutils && lspci | grep -i nvidia

The nvidia-smi tool should also be installed alongside your drivers, and it will give you a real-time view of your GPU’s status, including how much GPU memory is being used and by whom.

$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060 Ti     On  |   00000000:00:07.0 Off |                  N/A |
|  0%   36C    P1             21W /  180W |   10866MiB /  16311MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           16951      C   ./build/bin/llama-server              10856MiB |
+-----------------------------------------------------------------------------------------+

You can also try the clinfo package & command. clinfo --list should enumerate your Nvidia GPU as a usable platform and device.

A more interesting real-world test is to use llama.cpp to run open source large language models on your GPU.

sudo apt install -y build-essential cmake libvulkan-dev libcurl4-openssl-dev glslc
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_VULKAN=1
cmake --build build --config Release
sudo make -C build install #optional

More info for how to build llama.cpp from source.

Now to run llama.cpp and benchmark it against the CPU, using Deepseek Qwen:

curl -L -o deepseek.gguf https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF/resolve/main/DeepSeek-R1-0528-Qwen3-8B-UD-Q8_K_XL.gguf
llama-bench -m deepseek.gguf

These are the results from my RTX 5060 Ti GPU:

model	size	params	backend	ngl	test	t/s
qwen3 8B Q8_0	10.08 GiB	8.19 B	Vulkan	99	pp512	662.81 ± 0.26
qwen3 8B Q8_0	10.08 GiB	8.19 B	Vulkan	99	tg128	41.47 ± 0.05

To compare against your CPU and see how much of a speedup you’re getting, disable the vulkan initial client driver, and then run the benchmark again:

VK_ICD_FILENAMES='' llama-bench -m deepseek.gguf

My CPU is pretty beefy, but even still the GPU knocked it out of the park.

model	size	params	backend	ngl	test	t/s
qwen3 8B Q8_0	10.08 GiB	8.19 B	Vulkan	99	pp512	89.84 ± 0.40
qwen3 8B Q8_0	10.08 GiB	8.19 B	Vulkan	99	tg128	5.26 ± 0.09

If you don’t care about open source and just want to maximize performance, you can also use llama.cpp’s Cuda backend. Just install 5gb of extra nonsense from nvidia’s proprietary Cuda toolkit:

sudo apt install -y cuda
echo 'export PATH="$PATH:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/lib"' >> ~/.bashrc

…and rebuild llama.cpp:

cmake -B build -DGGML_CUDA=1
cmake --build build --config Release
./build/bin/llama-bench -m deepseek.gguf

It’s hard to argue with the results though:

model	size	params	backend	ngl	test	t/s
qwen3 8B Q8_0	10.08 GiB	8.19 B	CUDA	99	pp512	3348.54 ± 4.62
qwen3 8B Q8_0	10.08 GiB	8.19 B	CUDA	99	tg128	41.27 ± 0.07

Step 7: Maintenance

On normal linux systems the nvidia driver kernel module will be rebuilt whenever a new kernel is installed. However on Qubes, the kernel comes from Dom0 so the drivers aren’t rebuilt automatically. This means if dom0 updates the linux kernel version that it provides to your sys-gpu VM, then the drivers may stop working. This sometimes occurs after running qubes-dom0-update.

To fix this, you just need to spin up your sys-gpu VM and rebuild the kernel modules.

sudo dpkg-reconfigure nvidia-kernel-open-dkms

Happy computing!

kail67 · October 27, 2025, 3:20am

Hi. I used the guide, but encountered a problem with browser performance in VM.
actual proprietary driver Modules installed. debian 12 vm.
cuda tests result:
GPU Performance Results:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ GPU Burn…: 13 TFLOPS
✓ N-body…: 8.5 TFLOPS
✓ CUDA Bandwidth…: 23-25 GB/s

Considering virtualization, the result looks pretty good for my motherboard with PCI-Express 4.0 x16 support and gtx 3060ti.

Next, I tested the GPU in standard Chrome, and the result disappointed me. The card was detected and connected to Chrome, but it showed modest results.

Vulkan through ANGLE.
WebGL Aquarium - this test always shows 20 fps at any settings and clearly lags, while the video card itself is almost not loaded.

Graphics Backend Status:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Vulkan through ANGLE…: Working
OpenGL through ANGLE…: Not Working

OpenGL through ANGLE. Could not be launched.

Any thoughts on this?

tunt · October 30, 2025, 11:25pm

@kali67 sorry, i don’t have much experience with WebGL, OpenGL, or ANGLE. Could there be extra dependencies you need to install?

On my sys-gpu VM (set up as described in the guide), WebGL aquarium doesn’t seem to be using my GPU at all either. nvidia-smi reports 0% utilization and no memory consumed.

Like i said, this guide is aimed at using GPUs for compute, not for graphics. For things like WebGL and ANGLE you might need extra steps which i haven’t looked into.

PS: My VM’s GPU drivers suddenly stopped working after a dom0 kernel update. I’ve update the guide with the fix.

otter2 · November 1, 2025, 9:44pm

I kid you not, you need working Optimus Prime for this, or break qubes gui integration and render on your GPU directly (with a monitor connected to it) (which also leaves you with a full desktop in a qube, but without clipboard sharing)

p.s.
If you have prime, see NVIDIA Optimus - Debian Wiki
If you don’t, use this guide (or any other guide really) to create a qube with full desktop

kail67 · November 6, 2025, 8:40pm

@otter2 @tunt
Sorry for my stupidity. I was very busy that day and missed the main topic of the thread a little.

My goal was to create the ultimate work qube where I could offload viewing guides in Firefox,watching videos on yt, rendering heavy websites, and all other graphics work to my GPU, thus lightening the load on other virtual machines. (Except for my main work with CUDA/Tensor cores).

It seems that this requires significantly more time for testing. I will need to return to testing closer to the new year.

Thanks for the guide, CUDA/Vulkan shows performance that suits me, considering the weak 3060 ti card.

otter2 · November 6, 2025, 9:56pm

It’s not that much, just two environment variables to set if you have optimus.

Quite a bit harder if you’re going to follow the gaming HVM guide with any distro off of iso, but since you are trying for debian, this road is well-explored on the forum as well.

PostedPortal · November 30, 2025, 7:25pm

I followed you instructions using NVIDIA CUDA and Vulcan. However, I am not able to get the GPU used on my browser for using background filters in google meets. I will install NVIDIA-CUDA-Toolkit and see if that works. Any ideas on how to get compatibility with conference call software through a web browser?

Is your external monitor working? Mine has not since hiding the NVDA gpu in grub. Is there anyway to get it working? My monitor is using HDMI. USB C monitor not working either.

otter2 · November 30, 2025, 7:47pm

Depends on your setup. Assuming you have more than one graphics card, connect the monitor to an interface of a card that is not hidden from dom0.

If you’re using laptop your display interfaces are most likely wired to the gpu you’re hiding from dom0.

I don’t know what google meet does, but cuda is generally for pure compute, not for graphics. If you want your gpu to render things, use prime or set up a desktop environment in your gpu qube (see messages above)