However, like others, I couldn’t run the GPU VM with more than a few GB of memory.
The trick was to set the max-ram-below-4g variable to 2G instead of 3.5G.
I now have a VM with a 4090 GPU and 32GB of RAM. nvidia-smi works and free -m shows all my memory.
Guest info:
I used an Arch install ISO image for debugging. The startup menu has an option for PCI device detection. (I tried the Debian installer ISO PCI device detection, but it didn’t work).
Arch is useful because it is easy to install the OS in BIOS mode, which QubesOS needs (guests can’t use UEFI). I want to try making an HVM TemplateVM next, that integrates with QubesOS better
Actually I would recommend adding these as notes on @neowutran’s guides (which are also available on Community Guides). It’s easier if we have a canonical place to refer to documentation rather than sparse notes.
In your bios, do you have the setting “resizable bar” available ?
If you have it, it is currently activated, right ?
If it is activated, could you try to disable it, switch back to 3.5G instead of 2G and check if it work for you ?
For me, now HVM created without dGPU assigned to it doesn’t boot with the error “No bootable device” in debug window.
It worked before changing patch from 3.5G to 2G and no Win HVM needed patching stubdoms, only xen.xml
I’ll check to patch stubdom as well and will be back with the info.
Meanwhile, someone could confirm this by simple creating HVM without dGPU
EDIT: On the other hand, I don’t see how patching could affect such a HVM, since the name of my HVM didn’t start with “gpu_” and it worked perfectly. I tried to rename to start with “gpu_” but to no avail.
Now, if this doesn’t bring confusion, I don’t know what would.
The only thing left to check is to rename it to start with “gpu_” and then to patch stubdom, but what the heck has happened meanwhile? Why this would ever happen to a VM with no device assigned?
The patch is only needed for HVM with GPU passthrough. For other kind of HVM, it make them unbootable if you assign more than ~max-ram-below-4g RAM.
( The reason of why I used the “gpu_” naming thing )
Keep the stubdom patch, remove the xen.xml patch. And use only “gpu_XXXX” naming for hvm with GPU passthrough.
Thanks. I have 2 HVMs. One with and one without dGPU assigned to. I need them both. The one without, worked with 6GB RAM. The one with - with 2048MB RAM only. Regardless of what was patched.
After switching 3.5->2GB in the patch, it’s reverse.
I am confused. If I have HVM without GPU passthrough, but other PCI device passthrough, what should I do to have it running with 8GiB memory and no “No bootable device” error?
To avoid this issue, for the “max-ram-below-4g” stubdom patch — only the stubdom patch method — the patch is only applied if the qube name start with “gpu_” Create a Gaming HVM
So if you use the xen.xml patch, you don’t have this difference between qube name starting with “gpu_”
If you want to use HVM using large amount of ram — with and without GPU passthrough —, you need to have this difference between qube named “gpu_” …
If you have time, xen.xml patch could be modified to only apply the patch if the qube name start with “gpu_”. It would be better because both stubdom patch method and xen.xml patch method would result in the same behavior regarding the “gpu_” naming thing
The comment on github mentionning the “No Bootable device” could be linked + copy-pasted to make it more explicit
If someone have a lot of time, it could maybe be possible to automatically decide to apply the patch or not by searching on how you can grep the pci devices list passed to the vm looking for something like NVIDIA/AMD/VGA/…
I’d rather not paste all that lspci output. Are you looking for resizeable BAR choices?
My BIOS does not have anything about BAR, but it does have a setting for TOLUD. I leave this setting at “dynamic”, because I read somewhere that choosing a value could cause problems booting.
Looking for your size value of those kind of lines:
Region 0: Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at c200 [size=128]
(My guess being: it the max size in thoses lines is 256M, then settings max-ram-below-4g to 3.5G should work, it max size is bigger than 256M, then settings max-ram-below-4g to 2G should work)