Quick howto: GPU passthrough with lots of RAM

xvrhthxn · September 27, 2023, 10:30pm

Users in this community have had a difficult time with GPU passthrough over the years.

I want to share a quick recipe for how I got mine working.

Motherboard chipset: Intel z790
CPU: Intel Raptor Lake series
GPU: NVIDIA RTX 4090 (24GB VRAM)
QubesOS version: 4.1.2 (R4.1)

I followed the guide by @neowutran .

However, like others, I couldn’t run the GPU VM with more than a few GB of memory.

The trick was to set the max-ram-below-4g variable to 2G instead of 3.5G.

I now have a VM with a 4090 GPU and 32GB of RAM. nvidia-smi works and free -m shows all my memory.

Guest info:

I used an Arch install ISO image for debugging. The startup menu has an option for PCI device detection. (I tried the Debian installer ISO PCI device detection, but it didn’t work).

Arch is useful because it is easy to install the OS in BIOS mode, which QubesOS needs (guests can’t use UEFI). I want to try making an HVM TemplateVM next, that integrates with QubesOS better

neowutran · September 28, 2023, 7:02am

GG !
Do you known why “2G” is required ?

xvrhthxn · September 28, 2023, 8:43am

My guess is that 4GB - 3.5GB = 500MB which is too low, and increasing that to 4GB - 2GB = 2GB helps with what the GPU needs.

solene · September 28, 2023, 9:52am

Nice one! Thanks for finding this and sharing about it

deeplow · September 28, 2023, 10:58am

Actually I would recommend adding these as notes on @neowutran’s guides (which are also available on Community Guides). It’s easier if we have a canonical place to refer to documentation rather than sparse notes.

tempmail · September 30, 2023, 8:48pm

Isn’t this the greatest tip ever on Qubes? Thanks for getting the idea to mess with the value itself at the first!

For what is worth, now it works for me, and for me it’s enough to patch only xen.xml, no other stub tweaks needed.

Wow! Is this a major breakthrough for getting tens of thousands Windows users to Qubes?

Yes, I’d say! F A B U L O U S !

balko · October 1, 2023, 3:52am

So, you had to modify xen.xml, adding kernel option max-ram-below-4g= is not enough? Which value has you used for this option, “2G”?

tempmail · October 1, 2023, 8:49am

{% if vm.features.check_with_template('linux-stubdom', True) %}
                        type="stubdom-linux"
                    {% else %}
                        type="stubdom"
                    {% endif %}
                    {% if vm.netvm %}
                      {% if vm.features.check_with_template('linux-stubdom', True) %}
                        cmdline="-qubes-net:client_ip={{ vm.ip -}}
                            ,dns_0={{ vm.dns[0] -}}
                            ,dns_1={{ vm.dns[1] -}}
                            ,gw={{ vm.netvm.gateway -}}
                            ,netmask={{ vm.netmask }} -machine xenfv,max-ram-below-4g=2G"
                      {% else %}
                        cmdline="-net lwip,client_ip={{ vm.ip -}}
                            ,server_ip={{ vm.dns[1] -}}
                            ,dns={{ vm.dns[0] -}}
                            ,gw={{ vm.netvm.gateway -}}
                            ,netmask={{ vm.netmask }} -machine xenfv,max-ram-below-4g=2G"
                      {% endif %}
                      {% else %}
                        cmdline="-machine xenfv,max-ram-below-4g=2G"
                    {% endif %}

… starting from line 165.

No, I didn’t have to. I choose to, because it’s easier and it worked changing only xen.xml. No Qubes reboot required.

neowutran · October 7, 2023, 7:30am

Hi ! I have some questions for you if you have time.

Can you post the result of “sudo lspci -vvvnns YOUR_GPU_PCI_IDENTIFIER” from your gpu hvm ? (Possible to Fix 2048MB RAM Limit for NVIDIA GPU Passthrough to HVM? - #36 by neowutran)
In your bios, do you have the setting “resizable bar” available ?
If you have it, it is currently activated, right ?
If it is activated, could you try to disable it, switch back to 3.5G instead of 2G and check if it work for you ?

(Possible to Fix 2048MB RAM Limit for NVIDIA GPU Passthrough to HVM? - #38 by neowutran)

tempmail · October 10, 2023, 7:05pm

For me, now HVM created without dGPU assigned to it doesn’t boot with the error “No bootable device” in debug window.

It worked before changing patch from 3.5G to 2G and no Win HVM needed patching stubdoms, only xen.xml

I’ll check to patch stubdom as well and will be back with the info.

Meanwhile, someone could confirm this by simple creating HVM without dGPU

EDIT: On the other hand, I don’t see how patching could affect such a HVM, since the name of my HVM didn’t start with “gpu_” and it worked perfectly. I tried to rename to start with “gpu_” but to no avail.
Now, if this doesn’t bring confusion, I don’t know what would.

The only thing left to check is to rename it to start with “gpu_” and then to patch stubdom, but what the heck has happened meanwhile? Why this would ever happen to a VM with no device assigned?

tempmail · October 10, 2023, 8:15pm

Nope, renamed and patched stubdom, still “No bootable device”…

neowutran · October 10, 2023, 10:58pm

The patch is only needed for HVM with GPU passthrough. For other kind of HVM, it make them unbootable if you assign more than ~max-ram-below-4g RAM.
( The reason of why I used the “gpu_” naming thing )
Keep the stubdom patch, remove the xen.xml patch. And use only “gpu_XXXX” naming for hvm with GPU passthrough.

And it should work

tempmail · October 11, 2023, 4:51am

Thanks. I have 2 HVMs. One with and one without dGPU assigned to. I need them both. The one without, worked with 6GB RAM. The one with - with 2048MB RAM only. Regardless of what was patched.
After switching 3.5->2GB in the patch, it’s reverse.

balko · October 11, 2023, 6:04am

I am confused. If I have HVM without GPU passthrough, but other PCI device passthrough, what should I do to have it running with 8GiB memory and no “No bootable device” error?

neowutran · October 11, 2023, 6:27am

I will try to explain it in a different way.

The “No bootable device” issue is known for a long time. You can find reference about that in this github issue: AppVM with GPU pass-through crashes when more than 3.5 GB (3584MB) of RAM is assigned to it · Issue #4321 · QubesOS/qubes-issues · GitHub
To avoid this issue, for the “max-ram-below-4g” stubdom patch — only the stubdom patch method — the patch is only applied if the qube name start with “gpu_” Create a Gaming HVM

So if you use the xen.xml patch, you don’t have this difference between qube name starting with “gpu_”

If you want to use HVM using large amount of ram — with and without GPU passthrough —, you need to have this difference between qube named “gpu_” …

The guide Create a Gaming HVM can be improved:

If you have time, xen.xml patch could be modified to only apply the patch if the qube name start with “gpu_”. It would be better because both stubdom patch method and xen.xml patch method would result in the same behavior regarding the “gpu_” naming thing
The comment on github mentionning the “No Bootable device” could be linked + copy-pasted to make it more explicit
If someone have a lot of time, it could maybe be possible to automatically decide to apply the patch or not by searching on how you can grep the pci devices list passed to the vm looking for something like NVIDIA/AMD/VGA/…

deeplow · October 12, 2023, 6:53am

2 posts were split to a new topic: Windows 10: no bootable device

xvrhthxn · October 25, 2023, 10:40pm

I’d rather not paste all that lspci output. Are you looking for resizeable BAR choices?

My BIOS does not have anything about BAR, but it does have a setting for TOLUD. I leave this setting at “dynamic”, because I read somewhere that choosing a value could cause problems booting.

neowutran · October 25, 2023, 10:56pm

Looking for your size value of those kind of lines:

Region 0: Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at c200 [size=128]

neowutran · October 25, 2023, 10:59pm

(My guess being: it the max size in thoses lines is 256M, then settings max-ram-below-4g to 3.5G should work, it max size is bigger than 256M, then settings max-ram-below-4g to 2G should work)

xvrhthxn · October 26, 2023, 1:39am

My dGPU has three memory regions: 16M, 32G, 32M.

My iGPU has two: 16M, 256M

I’m not sure if that helps