My instinct says you get the problem when the model is getting partially offloaded to the CPU. Some kind of I/O problem perhaps? I think I would continue trying to troubleshoot why the GPU is not found when you pass both devices. You may or may not have noticed, there is also this Secure AI Inference with Qubes OS: A GPU Passthrough & Ollama Guide thread that is also for a Debian HVM. You might want to compare the driver installation procedure you used with the commands in that thread.