GPU passthrough causes memory corruption

hi,
Ich passed my RTX5050 MAX-Q to a standalone HVM with 16gig of memory. Kernel is provided by qube and memory balancing is off. I installed nvidia-open drivers like explained HERE.
Disabled/blacklisted noveau also. Everythign worked perfectly, nvidia-smi detected the card also. I tried running some small (1B) models and it worked. But when I start running like 4b models, or copy large files (about 7gig), it causes segmentation fauls, malloc()… top size, kernel panic, something like that. The VM crashes. What is the issue? Can you maybe help?

You should provide more complete hardware information. Can you include the motherboard and all the PCI devices you pass through to the HVM? Also you should check the IOMMU grouping of your devices. If the grapics card is not alone in its IOMMU group you can get problems. IOMMU groups can be checked with a regular Linux loaded from a USB stick for example.

Thanks for your reply! I have a gigabyte A16 laptop with AMD ryzen 7. The HVM has only the RTX5050 passed as device. Indeed, there is also an audio device in the IOMMU group but when I pass it to the VM, nvidia-smi does not detect my gpu anymore (no device found).
Also if I try with not detected GPU, running a 4b model with ollama causes segmentation fault. So I think it is a problem with memory and HVM.

I’m not sure what you mean with not detected GPU. In principle, if the GPU is not detected by the HVM it shouldn’t be able to use it at all. When you do ollama ps you have the work loaded to the GPU?

The audio device is the nvidia one? Usually I pass both devices (graphics and audio) to the HVM. But I have no experience doing that with laptops and I can imagine they have more complications compared to desktops.

When I pass both, the GPU and the GPU Audio device (which are in same IOMMU group), nvidia-smi does not detect the GPU at all. Passing only the GPU makes nvidia-smi to detect it

When you run ollama ps it says the work is loaded to CPU or GPU? I’m interested about this because it could rule out the graphics card from being the culprit. Easier way to test might be if you don’t pass the graphics card at all and test if the ollama segfaults then.

ollama ps just gives (currently) empty list of models. systemctl shows that it uses GPU

May 20 14:21:17 gpu ollama[1041]: time=2026-05-20T14:21:17.986+02:00 level=INFO source=server.go:433 msg=“starting runner” cmd=“/usr/local/bin/ollam>
May 20 14:21:18 gpu ollama[1041]: time=2026-05-20T14:21:18.456+02:00 level=INFO source=runner.go:106 msg=“experimental Vulkan support disabled. To >
May 20 14:21:18 gpu ollama[1041]: time=2026-05-20T14:21:18.457+02:00 level=INFO source=server.go:433 msg=“starting runner” cmd=”/usr/local/bin/ollam>
May 20 14:21:18 gpu ollama[1041]: time=2026-05-20T14:21:18.759+02:00 level=INFO source=server.go:433 msg=“starting runner” cmd=”/usr/local/bin/ollam>
May 20 14:21:18 gpu ollama[1041]: time=2026-05-20T14:21:18.760+02:00 level=INFO source=server.go:433 msg=“starting runner” cmd="/usr/local/bin/ollam>
May 20 14:21:18 gpu ollama[1041]: time=2026-05-20T14:21:18.859+02:00 level=INFO source=model_recommendations.go:177 msg="model recommendations cache>
May 20 14:21:19 gpu ollama[1041]: time=2026-05-20T14:21:19.000+02:00 level=INFO source=types.go:42 msg=“inference compute” id=GPU-22074607-a8d3-a9c3>
May 20 14:21:19 gpu ollama[1041]: time=2026-05-20T14:21:19.000+02:00 level=INFO source=routes.go:1914 msg=“vram-based default context” total_vram="8>

without GPU there are no segfaults

My instinct says you get the problem when the model is getting partially offloaded to the CPU. Some kind of I/O problem perhaps? I think I would continue trying to troubleshoot why the GPU is not found when you pass both devices. You may or may not have noticed, there is also this Secure AI Inference with Qubes OS: A GPU Passthrough & Ollama Guide thread that is also for a Debian HVM. You might want to compare the driver installation procedure you used with the commands in that thread.

Tried that also. already at installtion of nvidia drivers, I’m getting segmentation faults

That sounds exactly like a PCI passthrough issue from my experience. You have hidden the nvidia audio device as well as the VGA device from dom0? I just don’t understand why you would have more issues when passing the audio device unless it is something to do with the device being a laptop.

I really don’t understand the issue also… this is very weird… Yes I have hidden both, the GPU and the audio device also. Passing only the GPU to VM works almost perfectly. Only problem is, copying large files does not work, after aber 2.4gig of progress is gets interrupted, intput/output error, and the VM crashes.

Passing both devices lets nvidia-smi not even find the GPU. Also ollama works in CPU mode. I really appreciate every help

This is really frustrating, as everything works so nicely but only the above problem exists and prevents me from using my GPU…

I’m sorry I can’t help. I don’t know anything more. I hope someone can help.

appreciate you tried it!

Found out it has nothing to do with Nvidia! I created a new HVM standalone. copied a large file and same issue! i get a kernel panic

Kernel panic - not syncing: Attempted to kill init!

UPDATE: One more thing I found out: If do NOT assign the RTX5050 to the HVM, everythign works. If I assign the GPU, it begins to throw segmentation errors and kernel panics (with or without nvidia drivers). So just assigning the GPU (Permissive and with no-strct-reset) influences this.

Maybe somebbody can help out…