Hey qubesters. I recently acquired a new rig with a powerful nvidia GPU, and spent the last few days setting it up. I wanted to this GPU attached to a qube via PCI passthrough, so that I could run local AI programs like Stable Diffusion and llama.cpp. This guide distills a few days of tinkering into a short and sweet step-by-step process which worked for my machine. The hardest part was figuring out how to install the right drivers.
The end result of this guide is a sys-gpu
VM which can run Cuda or Vulkan compute applications. Possibly graphics applications too, but I’m mostly interested in compute, so I didn’t thoroughly test 3d rendering or anything like that.
disclaimer: I’ve heard that GPU passthrough is a cantankerous beast to set up. On my hardware it was relatively simple to get going once my Qubes installation was functional, but your mileage may vary. Other guides abound.
Prerequisites
You need two GPUs for this to work.
One will be used by dom0 to render your qubes desktop environment. The other will be attached to your GPU compute VM. Most CPUs have a built-in integrated-GPU (iGPU) baked in, but some (certain AMD chips) do not. If you’re shopping for a new computer, make sure to double-check the CPU has “integrated graphics” or an “integrated GPU” built-in.
You also need an nvidia graphics card. Mine is an RTX 5060 Ti. I’m sure there are similar steps for AMD cards, but i don’t have one so I can’t speak to that.
Steps
At a high level, the procedure is:
- Set up Qubes OS so that the discrete GPU can be safely attached to a qube.
- Create and configure a
sys-gpu
VM - Start the
sys-gpu
VM - Install nvidia drivers
- Restart
sys-gpu
- Test cuda/vulkan compute applications
Step 0: Close Everything and BACK UP
You’re about to do some dangerous stuff that may mess with your qubes OS installation. Make sure to save your work, shut down your qubes, and back them up before proceeding.
Step 1: Set Up Qubes OS
If your qubes OS machine has two GPUs, then qubes only uses one of them to render the graphical desktop environment that you are probably reading this page on right now. The other GPU is left unused.
Running a desktop GUI is pretty light work as far as graphics cards are concerned, so it is completely reasonable to dedicate your iGPU to rendering Qubes OS and dom0, while keeping your larger nvidia dGPU in reserve (so we can use it inside a qube later).
To do this, we need to force dom0 to use a specific one of your GPUs.
For me, this was very easy: On my machine, the iGPU and nvidia GPU each have their own separate sets of HDMI/DisplayPort outputs. When I boot the computer, the OS detects which one has a cable connected, and dom0 uses that GPU only. So for me, it was as simple as plugging the HDMI cord into my iGPU and NOT into the nvidia GPU. If this is the case for you, turn off your PC, plug the HDMI cord into the iGPU, and proceed to step 2.
If your machine has only a single video output port for both GPUs, then you’ll need to tell the linux kernel to hide the nvidia GPU from dom0, thus forcing dom0 to use the iGPU to render Qubes OS.
How to manually hide your dGPU from dom0
- Open a new dom0 shell
- Find your dGPU’s PCI identifier string:
lspci | grep VGA | grep -i nvidia
Example Output:
01:00.0 VGA compatible controller: NVIDIA Corporation Device
- The PCI identifier is the first string, in this case
01:00.0
is the PCI device ID.
gpu_pci_id="01:00.0"
- Edit
/etc/default/grub
in dom0 and add anrd.qubes.hide_pci
flag to your linux kernel command line parameters:
GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX rd.qubes.hide_pci=01:00.0"
- Regenerate the grub configuration file:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Here is a one-liner which does all the above:
echo "GRUB_CMDLINE_LINUX=\"\$GRUB_CMDLINE_LINUX rd.qubes.hide_pci=$(lspci | grep VGA | grep -i nvidia | awk '{print $1}')\"" |
sudo tee -a /etc/default/grub &&
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Now you should reboot your computer. You may encounter booting issues here if you did this wrong. If qubes OS doesn’t boot cleanly, you’ll need to edit your kernel’s command line parameters at boot to remove the rd.qubes.hide_pci=<id>
flag.
Restart your computer, and wait until you see the a screen offering choices like “Qubes with Xen Hypervisor”, and “Advanced options for Qubes”. Press the e
key. Now you are editing the kernel command line parameters ephemerally - any changes you make will only affect this boot attempt. Remove rd.qubes.hide_pci=<id>
. Now press Ctrl+x
to boot. Qubes should start cleanly now. You should go back and figure out what exactly you did wrong in the PCI-hiding procedure - maybe you used the wrong PCI ID by mistake.
To revert your changes and get your system working normally again, just remove that new line from /etc/default/grub
and run grub2-mkconfig
again.
Step 2: Create a sys-gpu
qube
Create a new Standalone qube sys-gpu
. You can use Debian, Fedora, or whatever you like. Configure it as follows:
- Initial memory: At least 2000 MiB
- Include in memory balancing: NO (uncheck the box)
- Mode: HVM
- Devices: Attach your nvidia GPU device
BEFORE YOU START THE QUBE, make sure you have completed step 0: close all other qubes and save your work. If you made any mistakes during the previous stage, dom0 may still be using the nvidia GPU. If this is the case, then launching sys-gpu
will detach the graphics card from dom0 and crash your screen. You will need to force reboot your computer if this happens and return to step 1 (hide your Nvidia GPU from dom0).
Step 3: Start sys-gpu
If you can start sys-gpu
without crashing your screen, congrats! You’re done with the hard and dangerous part. Now comes the annoying part.
Step 4: Install Nvidia Drivers
This part was the hardest for me, as there were many conflicting resources, different guides suggesting different methods. I tried most of them but only one worked.
Among the many guides I found online for setting up Nvidia drivers on Linux, their methods all fell into three buckets:
- Install drivers using a runfile pulled from Nvidia’s website
- Install drivers from your template OS’s default package manager repositories (e.g. Debian’s
non-free
repo) - Install drivers using Nvidia’s package manager repositories
In my case, I was using a Debian 12 template. I tried all three, but only Nvidia’s repository worked for my graphics card. Specifically, it was this documentation from Nvidia that I followed:
The TLDR for Debian to install the open-source drivers:
curl -L -o /tmp/cuda-repo.deb \
https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i /tmp/cuda-repo.deb && rm /tmp/cuda-repo.deb
sudo apt update
sudo apt install -y nvidia-open
Depending on your hardware, you may find the other two methods work for you. The debian non-free
repo’s drivers were too obsolete for my new card - Check out the nvidia-detect
package in the non-free
repo to see if the old drivers work for your card. If they do, maybe you could get by without 3rd party repositories or runfiles. Just do add-apt-repository non-free contrib
and apt install nvidia-driver
. You may need the software-properties-common
package to get the add-apt-repository
command.
The Nvidia runfile seemed to work at first, but vulkaninfo
wouldn’t recognize the GPU and nvidia-smi
reported no devices. IDK what happened there.
Using the nvidia repository on the other hand worked seamlessly for my GPU. It also gives easy access to download the cuda toolkit if you please, with a simple apt install cuda
, so this is my recommendation.
Step 5: Restart sys-gpu
This is an optional step - I didn’t need to in my case - but I’ve seen many resources say that to fully load the new nvidia kernel module, the machine (or qube in this case) needs to be restarted.
In some cases, you may have to edit the qube’s settings to: Kernel: “provided by qube” or else the qube won’t boot with the new kernel module. For me, this only happened to me when using the Nvidia runfile installer. The drivers from the nvidia debian repo worked fine without any other qvm-prefs
changes.
Step 6: Test Cuda/Vulkan Apps
At this stage, your sys-gpu
qube should be ready for prime time. If you have any cuda or vulkan applications, now is the moment of truth to test them out.
An easy and quick test to check Vulkan is working: Use the vulkaninfo
tool to see if libvulkan
can detect your GPU.
sudo apt install vulkan-tools -y &&
vulkaninfo --summary |
grep -A9999 Devices
This should print out a summary of available vulkan devices. One may be your CPU, but you should definitely see your Nvidia GPU in there too now. If not, you may need to retry the driver installation step and troubleshoot from there. Make sure your nvidia GPU is actually attached and visible inside sys-gpu
. To check with lspci
:
sudo apt install pciutils && lspci | grep -i nvidia
The nvidia-smi
tool should also be installed alongside your drivers, and it will give you a real-time view of your GPU’s status, including how much GPU memory is being used and by whom.
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5060 Ti On | 00000000:00:07.0 Off | N/A |
| 0% 36C P1 21W / 180W | 10866MiB / 16311MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 16951 C ./build/bin/llama-server 10856MiB |
+-----------------------------------------------------------------------------------------+
You can also try the clinfo
package & command. clinfo --list
should enumerate your Nvidia GPU as a usable platform and device.
A more interesting real-world test is to use llama.cpp
to run open source large language models on your GPU.
sudo apt install -y build-essential cmake libvulkan-dev libcurl4-openssl-dev glslc
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_VULKAN=1
cmake --build build --config Release
sudo make -C build install #optional
More info for how to build llama.cpp
from source.
Now to run llama.cpp
and benchmark it against the CPU, using Deepseek Qwen:
curl -L -o deepseek.gguf https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF/resolve/main/DeepSeek-R1-0528-Qwen3-8B-UD-Q8_K_XL.gguf
llama-bench -m deepseek.gguf
These are the results from my RTX 5060 Ti GPU:
model | size | params | backend | ngl | test | t/s |
---|---|---|---|---|---|---|
qwen3 8B Q8_0 | 10.08 GiB | 8.19 B | Vulkan | 99 | pp512 | 662.81 ± 0.26 |
qwen3 8B Q8_0 | 10.08 GiB | 8.19 B | Vulkan | 99 | tg128 | 41.47 ± 0.05 |
To compare against your CPU and see how much of a speedup you’re getting, disable the vulkan initial client driver, and then run the benchmark again:
VK_ICD_FILENAMES='' llama-bench -m deepseek.gguf
My CPU is pretty beefy, but even still the GPU knocked it out of the park.
model | size | params | backend | ngl | test | t/s |
---|---|---|---|---|---|---|
qwen3 8B Q8_0 | 10.08 GiB | 8.19 B | Vulkan | 99 | pp512 | 89.84 ± 0.40 |
qwen3 8B Q8_0 | 10.08 GiB | 8.19 B | Vulkan | 99 | tg128 | 5.26 ± 0.09 |
If you don’t care about open source and just want to maximize performance, you can also use llama.cpp
’s Cuda backend. Just install 5gb of extra nonsense from nvidia’s proprietary Cuda toolkit:
sudo apt install -y cuda
echo 'export PATH="$PATH:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/lib"' >> ~/.bashrc
…and rebuild llama.cpp
:
cmake -B build -DGGML_CUDA=1
cmake --build build --config Release
./build/bin/llama-bench -m deepseek.gguf
It’s hard to argue with the results though:
model | size | params | backend | ngl | test | t/s |
---|---|---|---|---|---|---|
qwen3 8B Q8_0 | 10.08 GiB | 8.19 B | CUDA | 99 | pp512 | 3348.54 ± 4.62 |
qwen3 8B Q8_0 | 10.08 GiB | 8.19 B | CUDA | 99 | tg128 | 41.27 ± 0.07 |