[qubes-users] Ollama (or Llama.cpp) with GPU Passthrough

Metatron · June 22, 2024, 3:04am

Dear Qubers,

I would like to enquire if anyone has had any sucess with Ollama with
GPU passthrough?

I am using an Arch linux template that is working well for dedicated
video out and can play media and games (with stutter) which is already a
great convienance on Qubes.

However the main reason I built a passthrough set up was for Ollama /
Llama.cpp.

I've tried pci strict resetting, no dynamic memory balancing, dasharo
and standard bios. Everything works fine if I swap from qubes OS to
Arch.

I'm using ollama-rocm.

I used the gpu-passthrough guide at:

GPU is detected but loading does not progress beyond:
llm_load_tensors: CPU buffer size = 35.44 MiB
It just hangs forever.

Full dump below.

If anyone has got any further or has any thoughts please let me know!