Secure AI Inference with Qubes OS: A GPU Passthrough & Ollama Guide

Yes, it works. It is the Hermes Agent that is doing all the work, and I run it in a standalone VM with full root access, another reason why you want to run it in its own VM.

The inference engine does need to have tool calling enabled, and you need a model that supports tool calling.

The Qwen3.6 models are probably the best models you can use locally for tool calling.

2 Likes

Exciting, thank you!

btw, in my tests, Gemma4 worked best among a number of local models I tested (including Qwen3.6).

Currently, it’s using LM Studio, but I’ll switch it to running
llama.cpp and

Just noticed debian 14 has packaged llama.cpp package[1].

Footnotes:
[1] I Challenge Thee

Yes, it works. It is the Hermes Agent that is doing all the work

I am currently testing Goose Agent[1]. Do you think if Hermes Agent is
better?

Footnotes:
[1] GitHub - aaif-goose/goose: an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM · GitHub

I experimented with something like that as well a short while ago. But afer one or two days of updates the nvidia driver segfaulted on start up.
Did you have this problem before?
It might be a good idea to pin it?

I have never used Goose Agent, but Hemes is one of the most popular agent because it works very well.

No, I only install the nvidia cuda driver not the full nvidia driver, and I’ve not had any issues with crashes.

You could pin the driver, or simply not update unless you have a reason to do so.

Tested the guide yesterday. worked fine for two boots.
Started it again today and the driver crashes again.
Didn’t even update. Hope I find time to figure out why.

Edit: Now it works again. Idk. GPU drivers are just broken af.

Do you use the GPU in more than one VM?

In my system using an 4060, the first VM the GPU gets attached to will load the firmware onto the GPU, and the firmware will persist after being detached from the VM. It also persists through reboots, the firmware only resets after the entire system has been full powered down.

If you are using the GPU with multiple VMs, make sure you are using the same driver version in all the VMs, or you might get random crashed depending on what firmware got loaded onto the GPU.

1 Like

That is maybe related!

Edit: It is not. Every second reboot the GPU driver just crashes X.

Fyi managed to get Hermes Agent (without browser addon) working using a debian 13 minimal template + appvm combo by doing the following:

  • In the template install: $ sudo apt install qubes-core-agent-networking thunar qubes-core-agent-thunar gedit curl git nodejs npm ripgrep ffmpeg build-essential python3-dev libffi-dev -y
    • Gedit is optional for editing text files easily instead of using e.g. nano in terminal
    • Need git nodejs npm ripgrep ffmpeg build-essential python3-dev libffi-dev for Hermes Agent (the installer would install those in the appvm, but they would be removed after reboot)
  • Create appVM, in services enabled “crond” service
  • Open normal terminal in appVM, install Hermes: $ curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash -s – --skip-browser
    • Note the --skip-browser flag was for my specific needs where I didnt want the agent doing browser stuff. You might need it, but then you might need to install additional packages in the template
  • Go through all steps according to your needs.
  • Once done, run $ source ~/.bashrc
  • Verify version: $ hermes --version
2 Likes

Can you provide the arguments for llama-server binary ?