If you like to run an AI model locally this is how I have been running ollama in a dedicated appVM. Performance is alright depending on the size of the choosen model.
Recommended settings for appVM
private storage max size: 80 GB
initial memory: 16000 MB
max memory: what you can spare
VCPUs: 4
In the template:
sudo pacman -Syu
sudo pacman -S ollama
sudo pacman -S docker docker-compose # optional
In the appVM:
sudo mkdir -p /rw/bind-dirs/var/lib/ollama
sudo mkdir -p /rw/config/qubes-bind-dirs.d
sudo nano /rw/config/qubes-bind-dirs.d/50_user.conf
binds+=( ‘/var/lib/ollama’ )
binds+=( ‘/var/lib/docker’ )
sudo nano /rw/config/rc.local
#!/bin/sh
# increase swap size
swapoff /dev/xvdc1
parted -s /dev/xvdc rm 1
parted -s /dev/xvdc rm 3
parted -s /dev/xvdc mkpart primary linux-swap 10G
mkswap /dev/xvdc
swapon -d /dev/xvdc# service is disabled in template
systemctl start ollama# several AI projects offer docker containers, you could
# run ollama in a docker container instead if you like
# systemctl start docker
Restart appVM, download a language model and run it
ollama help
ollama pull llama3.2
ollama run llama3.2
ollama on the command line is used similarly to docker. Using run
gives you a chat interface in the terminal, however it’s service also offers an API running/listening on 127.0.0.1:11434
. Have fun and may enough RAM be with you.