Running Local AI Models on Qubes OS?

kakybe · February 13, 2026, 11:34am

Hey guys,

I’m curious about running local AI chatbot models on Qubes. Does anyone here have experience with that? Any recommendations or best practices ?

I’m running Qubes on a pretty powerful desktop, so performance-wise I should be okay. However, I’m unsure how well will they run without NVIDIA’s GPU Drivers.

Has anyone tested CPU-only performance? And Is it worth the hassle ?

Thanks in advance!

Pawelek85 · February 13, 2026, 1:59pm

Hey,
I can say one thing: CPU performance is fine.

My approach was to install the MSTY.ai application in a separate Standalone qube, then download the AI model I was interested in and disconnect the qube from the internet.

The entire configuration in msty is easy, but I would certainly achieve better results if I connected a second graphics card to the VM with MSTY.ai.

Nig · February 13, 2026, 1:59pm

Should work in theory

Atrate · February 13, 2026, 3:10pm

I have even managed to run a model (Deepseek 8b) with GPU acceleration in a dedicated VM.

lars-qubes · February 13, 2026, 3:30pm

Did you just PCI-pass-through your GPU and let ollama deal with it during installation?

Atrate · February 13, 2026, 3:41pm

Yes. I had it set-up on debian-12-xfce with proprietary NVIDIA drivers and CUDA libraries installed.

otter2 · February 13, 2026, 4:47pm

Depends on how many cores you are willing to give it. If you are running large models, don’t have server CPU core count, and have an option to pass GPU to accelerate, do it.

There is some rationale in running it on CPU though. One example is if you are limited in the amount of GPUs (one for guivm, at least one for compute, and you need another one for something else, but can’t get more), you might tolerate lower speed, especially if your CPU is capable. On the other hand, this can be solved by something like seamless gpu passthrough as well.

Stable but slower

renehoj · February 13, 2026, 6:29pm

I use LLMs for AI integration on Qubes OS.

I have one qube with GPU pass-through running Ollama, and my other qubes can connect to the Ollama API using qrexec.

Qrexec is similar to SSH port forwarding, it binds port 11434 to localhost on the qubes that want to use the Ollama API. It is straightforward to use Ollama in any qube, to applications running in the qube it looks like Ollama is running on localhost.

You can do the same without using a GPU, but the performance will not be great.

I have two 4060 GPUs in my desktop systems, one of them is dedicated to only running LLMs. Don’t know what hardware you have, but many AMD motherboards have an extra 4x PCIe slot with CPU-connected lanes, I use that slot for running the extra GPU.