Running Local AI Models on Qubes OS?

Depends on how many cores you are willing to give it. If you are running large models, don’t have server CPU core count, and have an option to pass GPU to accelerate, do it.

There is some rationale in running it on CPU though. One example is if you are limited in the amount of GPUs (one for guivm, at least one for compute, and you need another one for something else, but can’t get more), you might tolerate lower speed, especially if your CPU is capable. On the other hand, this can be solved by something like seamless gpu passthrough as well.

Stable but slower