I am planning a new, dedicated Qubes 4.2 workstation with a primary use case of local AI/ML development. The core architectural goal is to pass a high-VRAM NVIDIA GPU through to a dedicated Standalone HVM for CUDA-intensive tasks (training, fine-tuning, and inference with models like Llama 3 70B via Ollama/vLLM).
To maintain dom0 stability and security, the plan is to use a separate, simple AMD GPU for dom0 and all standard system display tasks.
Before purchasing the components, I would be extremely grateful for a sanity check of the proposed hardware and architecture from experienced users.
Proposed Bill of Materials (BoM):
CPU: AMD Ryzen 9 7950X
Motherboard: ASUS ProArt X670E-Creator WiFi (Chosen for its reported excellent IOMMU groupings)
Storage 2 (for AI HVM): 4TB NVMe PCIe 4.0 SSD (To be passed through along with the dGPU)
PSU: 1300W 80+ Titanium (e.g., Seasonic PRIME TX-1300)
Proposed Qubes Architecture:
dom0: Will control the system and exclusively use the AMD Radeon RX 6400 for display. The RTX 4090 will be hidden from dom0 at boot via the rd.qubes.hide_pci kernel parameter.
ai-hvm (Standalone HVM): A Fedora-based HVM. The RTX 4090 and the 4TB NVMe SSD will be passed through to this VM. It will house the proprietary NVIDIA drivers, CUDA Toolkit, and the entire AI stack (PyTorch, etc.).
Service & App VMs: Standard AppVMs (ai-lab for coding, chat-ui for inference) will be networked through a sys-fw-vpn → sys-vpn → sys-fw chain for security. Communication with the AI models running in ai-hvm will be handled via qrexec services (qvm-connect-tcp) rather than direct networking between AppVMs and the HVM.
Specific Questions:
Motherboard/GPU Compatibility: Has anyone in the community successfully and reliably passed through an RTX 4090 on the ASUS ProArt X670E-Creator platform under Qubes 4.2? Are there any known issues or required workarounds (e.g., specific BIOS versions, ACS patching)?
dom0 GPU Choice: Is the Radeon RX 6400 a solid, “fire-and-forget” choice for dom0stability, or is there a more battle-tested, community-recommended alternative?
BIOS/UEFI Settings: Beyond the standard IOMMU, SVM, and Above 4G Decoding settings, are there any non-obvious or AMD-specific settings that are critical for stable passthrough on this platform?
General Red Flags: Looking at the overall plan, are there any obvious architectural flaws or potential “gotchas” that I have overlooked?
Thank you in advance for your time and expertise. Any feedback, positive or negative, would be greatly appreciated before I commit to the purchase.
Unrelated to technical feasibility, but are you sure you want to do this on your own device? Are you experienced? The AI landscape evolves fast, creating an workstation for training & fine-tuning (and maybe even for inference) isn’t the best idea IMO
If you only want a small number of models at the same time then 4TB is enough, but if you want a lot of local models then you probably want more storage (llama 3 70b already needs 140 - 280gb dependent on the chosen precision), especially if you also consider the data needed for training/finetuning these monsters (a 100GB+ dataset is not rare).
I dont know, I probably would check Hardware compatibility list (HCL) | Qubes OS if there are known issues with your hardware, if everything is OK then GPU Passthrough is probably also no problem.
Sounds reasonable, but your chat ui probably doesnt need networking.
As much as I love Qubes, I dont think it is suited for serious AI training/fine-tuning/inference (I dont think any workstation is suited for it, thats why a lot of big companies are giving all their money to nvidia). Unrelated to Qubes, but if you just wanna start with Machine Learning then try out Google Colab or rent some GPUs, this is far more cost effective (Renting GPUs is also what I recommend to small to mid-size companies)
I’ve also experienced system-wide lags across all VMs during very high disk I/O within a single VM. I haven’t found any similar reports on the forum or github, which suggests this might be an issue on my system, but I still wanna mention this.
Thank you for taking the time to write such a detailed and thoughtful response. I genuinely appreciate the perspective.
You raise several valid points. For many use cases, especially short-term projects or for those just starting out, your recommendation to use cloud services like Google Colab or rented GPUs is absolutely the most cost-effective and practical approach. The point about storage requirements and the potential for I/O lag are also very helpful and something I will factor into the operational planning.
However, for this specific long-term project, the core requirements are different. Full data sovereignty, absolute control over the entire software stack, privacy, and the ability to build a compounding knowledge base on a persistent, offline model are non-negotiable. The goal is to build a sovereign asset, not to complete a task in the most efficient way possible. This shifts the cost-benefit analysis entirely.
Your comment about system-wide I/O lag is particularly interesting, as it strongly validates the plan to pass through the entire NVMe drive directly to the HVM, which should mitigate that very bottleneck by bypassing dom0 for storage I/O.
Thanks again for the input. I’m still very keen to hear if anyone has hands-on experience with the passthrough compatibility of this specific motherboard/GPU combination.
There are better, more stable (not because Qubes is unstable, but other reasons you surely already know about because you surely researched the differences between local AI Hardware and Server AI Hardware and the implications and limitations of using consumer hardware for training and fine tuning, which isn’t exclusively about the GPUs that are used!), and probably even more secure options available through full device compartmentalization.
I don’t want to recommend the options here in the Qubes forum, as it would be unrelated to Qubes, but what I can recommend is asking your favorite AI for that or maybe even doing a bit of manual research instead of blindly trusting a statistical model.
If you really want a local workstation for your use case, Qubes may still not be the most secure option; real compartmentalization is probably still more secure (for your use case)
Also, you should look into how large the models you train/finetune will be. 24GB VRAM is very limited for training/finetuning. I am sure you probably already know that, but training/finetuning is way more computationally heavy than running the model; offloading to RAM might be acceptable in inference, but not in training. You might be able to run a 70B model (relatively slow) locally, but don’t even think about being able to finetune it at full-size or without heavy quantization.
I can’t help you with that, but you might have some luck asking in the Qubes Matrix channel.
Developer of llm and autoregressive models in the thread.
Qubes OS + fedora 41 currently works great for all neural network related tasks that I had to use (and you probably do too). I can only give a couple of tips from personal experience
Install akmod-nvidia-open drivers, not akmod-nvidia. My 4090 on proprietary drivers often fell off. On Open it works without problems. And this is the official recommendation from nvidia, for all purposes the Open version is better than the usual one
Use the instructions from the forum, where salt is used. You do not need to use salt specifically (I do not know how and therefore did not use it), just follow the instructions and enter the same commands in the terminal. Be sure to delete nvidia.conf at the end
Large batches of gradients or models can lead to CUDA failure. This is probably a problem with Qubes OS as an architecture. More details p5
The akmod-nvidia-open build process requires significantly more temporary files for assembly. As well as time. Therefore, before installing it, you should remount /tmp to a size of 8GB. Usually uses 5.9GB, but extra space will not hurt.
Either large models, or large batches, or large models themselves lead to CUDA crashes due to exhaustion of shm (possibly related to the video card, Qubes or something else). This is also fixed by remounting /dev/shm. For the 12B model of parameters in fp32, 10GB of shm is enough for me.
According to tests on Qubes OS fedora 41 + open nvidia akmod, the performance of all tasks related to CUDA is HIGHER than on win11 natively running on a PC
P.s: do not forget to add nouveu to the grub blacklist. After updating Linux kernel fedora 41 does not automatically update grub to use the new kernel. But akmod updates nvidia modules to it. So you need to increase tmp every time you update Linux kernel. And call grub2-mkconfig, then add nouveu to the blacklist
P.p.s: Before turning off hvm or taking away its GPU, don’t forget to save the data in all cubes. Sometimes after turning off hvm everything just freezes, absolutely everything, the screen freezes at the current moment and doesn’t react to anything.
Thank you for the detailed, practical feedback. This is extremely helpful.
Your specific points regarding akmod-nvidia-open over the proprietary drivers, and the need to resize /tmp and /dev/shm for large model stability, are crucial insights that I will be incorporating directly into my build process. The warning about potential system freezes on HVM shutdown is also a very useful operational note.
It’s encouraging to see confirmation that this configuration is viable on recent Qubes/Fedora versions with a 4090.
I appreciate you taking the time to share your experience
I always have at least one qube running pass-through, often I have two, I never had any issues with stability.
For reference, I’m on Debian 12 using an 9950X with the MSI X670E Tomahawk motherboard, running two Nvidia 4060 for pass-through, dom0 is using the internal graphics. One GPU is passed to my Ollama qube, it’s a standalone appVM, it gets automatically started during the system boot. The other GPU is used with HVMs for playing video games, running widnows, etc.