Unable to run LLMs with an AMD GPU with ROCm

There’s been a problem I’ve been trying to troubleshoot now for a few months now and what must be 100s of hours but with nothing to show for it. This is out of my depth and I’m not sure what to do. I would appreciate any help or suggestions!

Setup

  • x2 MI100
  • Asus WRX80e + Threadripper Pro 3955WX
  • ROCm all versions
  • Linux all kernels
  • HVM Passthrough

The issues

  • When trying to run llama.cpp (All versions) with CPU+GPU acceleration the GPU freezes at 100% and uses 100W and never recovers.
  • VRAM seems to clear extremely slowly.
  • Some kernels (5.19) seem to work for small models but freezes when loading larger models. Ram only works perfectly fine.

What I think might be the cause(s)

  • Xen Hypervisor issue with VRAM - RAM communication.
  • AMD’s kernel module (amdgpu-dkms) and Xen compatibility
  • QubesOS’ passthrough hardening?
  • Incorrect Xen/Linux kernel parameters
  • IOMMU limitation? (iommu=pt has no effect)

This problem is way beyond me and I’m not sure where to start or where to look. I’m hoping Xen 4.19 fixes the problem but I still haven’t been able to figure out a root cause.

Please ask me any questions if you need anything. I feel lost.

2 Likes

Did you find a solution?

1 Like

Did you find a solution?

1 Like

It’s likely the program you are using doesn’t have a workaround to an integrated GPU.

I am not explaining the rest of this well, so bare with my non-expertise understanding:

If you use an integrated GPU and the GPU is also handling your video output, a program may try to send the model at 100 percent to the GPU (use all of it) and the GPU isn’t able to handle the regular video output and the 100 percent usage that was sent to it. (I read this a while ago, it was possibly 100 percent GPU memory?)

Have you tried LM Studio which has more parameters to limit how much data gets sent to the GPU? Does this happen with all model programs?

1 Like

It seems OP has 2 hardware https://www.amd.com/fr/products/accelerators/instinct/mi100.html and use passthrough to a qube. The qube does not even need to use it for video rendering because it can use Qubes OS video driver as usual.

2 Likes

You’re right. If it had been that easy, the poster probably would have solved it.

Did the original poster try different programs? Some programs just don’t know how to implement roc correctly and it’s harder to get them to work. I wish I knew how LM studio worked for the poster to help others troubleshoot.