Has anyone been able to run LLMs with an AMD GPU with ROCm?

Rel4967 · August 7, 2024, 11:22pm

There’s been a problem I’ve been trying to troubleshoot now for a few months now and what must be 100s of hours but with nothing to show for it. This is out of my depth and I’m not sure what to do. I would appreciate any help or suggestions!

Setup

x2 MI100
Asus WRX80e + Threadripper Pro 3955WX
ROCm all versions
Linux all kernels
HVM Passthrough

The issues

When trying to run llama.cpp (All versions) with CPU+GPU acceleration the GPU freezes at 100% and uses 100W and never recovers.
VRAM seems to clear extremely slowly.
Some kernels (5.19) seem to work for small models but freezes when loading larger models. Ram only works perfectly fine.

What I think might be the cause(s)

Xen Hypervisor issue with VRAM - RAM communication.
AMD’s kernel module (amdgpu-dkms) and Xen compatibility
QubesOS’ passthrough hardening?
Incorrect Xen/Linux kernel parameters
IOMMU limitation? (iommu=pt has no effect)

This problem is way beyond me and I’m not sure where to start or where to look. I’m hoping Xen 4.19 fixes the problem but I still haven’t been able to figure out a root cause.

Please ask me any questions if you need anything. I feel lost.