Ollama runs smoothly on QubesOS

If you like to run an AI model locally this is how I have been running ollama in a dedicated appVM. Performance is alright depending on the size of the choosen model.

Recommended settings for appVM

private storage max size: 80 GB
initial memory: 16000 MB
max memory: what you can spare
VCPUs: 4

In the template:

sudo pacman -Syu
sudo pacman -S ollama
sudo pacman -S docker docker-compose  # optional

In the appVM:

sudo mkdir -p /rw/bind-dirs/var/lib/ollama
sudo mkdir -p /rw/config/qubes-bind-dirs.d
sudo nano /rw/config/qubes-bind-dirs.d/50_user.conf

binds+=( ‘/var/lib/ollama’ )
binds+=( ‘/var/lib/docker’ )

sudo nano /rw/config/rc.local

#!/bin/sh

# increase swap size
swapoff /dev/xvdc1
parted -s /dev/xvdc rm 1
parted -s /dev/xvdc rm 3
parted -s /dev/xvdc mkpart primary linux-swap 10G
mkswap /dev/xvdc
swapon -d /dev/xvdc

# service is disabled in template
systemctl start ollama

# several AI projects offer docker containers, you could
# run ollama in a docker container instead if you like
# systemctl start docker

Restart appVM, download a language model and run it

ollama help
ollama pull llama3.2
ollama run llama3.2

ollama on the command line is used similarly to docker. Using run gives you a chat interface in the terminal, however it’s service also offers an API running/listening on 127.0.0.1:11434. Have fun and may enough RAM be with you.

3 Likes

does this guide use PyTorch models? Something to consider…

To my knowledge ollama does not support PyTorch models but GGUF only.

You can convert models from the safetensors format (which shouldn’t be vulnerable to deserializing Python object structures) into GGUF in a disposable VM. Also, you can remove the uplink of your VM running ollama whenever you don’t need it.

That being said, Ollama might be vulnerable to binary exploitation. It’s source code is mainly written in Go.