Quick howto: GPU passthrough with lots of RAM

Users in this community have had a difficult time with GPU passthrough over the years.

I want to share a quick recipe for how I got mine working.

Motherboard chipset: Intel z790
CPU: Intel Raptor Lake series
GPU: NVIDIA RTX 4090 (24GB VRAM)
QubesOS version: 4.1.2 (R4.1)

I followed the guide by @neowutran .

However, like others, I couldn’t run the GPU VM with more than a few GB of memory.

The trick was to set the max-ram-below-4g variable to 2G instead of 3.5G.

I now have a VM with a 4090 GPU and 32GB of RAM. nvidia-smi works and free -m shows all my memory.

Guest info:

I used an Arch install ISO image for debugging. The startup menu has an option for PCI device detection. (I tried the Debian installer ISO PCI device detection, but it didn’t work).

Arch is useful because it is easy to install the OS in BIOS mode, which QubesOS needs (guests can’t use UEFI). I want to try making an HVM TemplateVM next, that integrates with QubesOS better

8 Likes

GG ! :slight_smile:
Do you known why “2G” is required ?

1 Like

My guess is that 4GB - 3.5GB = 500MB which is too low, and increasing that to 4GB - 2GB = 2GB helps with what the GPU needs.

Nice one! Thanks for finding this and sharing about it :+1: :partying_face:

1 Like

Actually I would recommend adding these as notes on @neowutran’s guides (which are also available on Community Guides). It’s easier if we have a canonical place to refer to documentation rather than sparse notes.

3 Likes

Isn’t this the greatest tip ever on Qubes? Thanks for getting the idea to mess with the value itself at the first!

For what is worth, now it works for me, and for me it’s enough to patch only xen.xml, no other stub tweaks needed.

Wow! Is this a major breakthrough for getting tens of thousands Windows users to Qubes?

Yes, I’d say! F A B U L O U S !

1 Like

So, you had to modify xen.xml, adding kernel option max-ram-below-4g= is not enough? Which value has you used for this option, “2G”?

{% if vm.features.check_with_template('linux-stubdom', True) %}
                        type="stubdom-linux"
                    {% else %}
                        type="stubdom"
                    {% endif %}
                    {% if vm.netvm %}
                      {% if vm.features.check_with_template('linux-stubdom', True) %}
                        cmdline="-qubes-net:client_ip={{ vm.ip -}}
                            ,dns_0={{ vm.dns[0] -}}
                            ,dns_1={{ vm.dns[1] -}}
                            ,gw={{ vm.netvm.gateway -}}
                            ,netmask={{ vm.netmask }} -machine xenfv,max-ram-below-4g=2G"
                      {% else %}
                        cmdline="-net lwip,client_ip={{ vm.ip -}}
                            ,server_ip={{ vm.dns[1] -}}
                            ,dns={{ vm.dns[0] -}}
                            ,gw={{ vm.netvm.gateway -}}
                            ,netmask={{ vm.netmask }} -machine xenfv,max-ram-below-4g=2G"
                      {% endif %}
                      {% else %}
                        cmdline="-machine xenfv,max-ram-below-4g=2G"
                    {% endif %}

… starting from line 165.

No, I didn’t have to. I choose to, because it’s easier and it worked changing only xen.xml. No Qubes reboot required.

Hi ! I have some questions for you if you have time.

(Possible to Fix 2048MB RAM Limit for NVIDIA GPU Passthrough to HVM? - #38 by neowutran)

For me, now HVM created without dGPU assigned to it doesn’t boot with the error “No bootable device” in debug window.

It worked before changing patch from 3.5G to 2G and no Win HVM needed patching stubdoms, only xen.xml

I’ll check to patch stubdom as well and will be back with the info.

Meanwhile, someone could confirm this by simple creating HVM without dGPU

EDIT: On the other hand, I don’t see how patching could affect such a HVM, since the name of my HVM didn’t start with “gpu_” and it worked perfectly. I tried to rename to start with “gpu_” but to no avail.
Now, if this doesn’t bring confusion, I don’t know what would.

The only thing left to check is to rename it to start with “gpu_” and then to patch stubdom, but what the heck has happened meanwhile? Why this would ever happen to a VM with no device assigned?

Nope, renamed and patched stubdom, still “No bootable device”…

The patch is only needed for HVM with GPU passthrough. For other kind of HVM, it make them unbootable if you assign more than ~max-ram-below-4g RAM.
( The reason of why I used the “gpu_” naming thing )
Keep the stubdom patch, remove the xen.xml patch. And use only “gpu_XXXX” naming for hvm with GPU passthrough.

And it should work

Thanks. I have 2 HVMs. One with and one without dGPU assigned to. I need them both. The one without, worked with 6GB RAM. The one with - with 2048MB RAM only. Regardless of what was patched.
After switching 3.5->2GB in the patch, it’s reverse.

I am confused. If I have HVM without GPU passthrough, but other PCI device passthrough, what should I do to have it running with 8GiB memory and no “No bootable device” error?

I will try to explain it in a different way.


So if you use the xen.xml patch, you don’t have this difference between qube name starting with “gpu_”

If you want to use HVM using large amount of ram — with and without GPU passthrough —, you need to have this difference between qube named “gpu_” …

The guide Create a Gaming HVM can be improved:

  • If you have time, xen.xml patch could be modified to only apply the patch if the qube name start with “gpu_”. It would be better because both stubdom patch method and xen.xml patch method would result in the same behavior regarding the “gpu_” naming thing
  • The comment on github mentionning the “No Bootable device” could be linked + copy-pasted to make it more explicit
  • If someone have a lot of time, it could maybe be possible to automatically decide to apply the patch or not by searching on how you can grep the pci devices list passed to the vm looking for something like NVIDIA/AMD/VGA/…

2 posts were split to a new topic: Windows 10: no bootable device

I’d rather not paste all that lspci output. Are you looking for resizeable BAR choices?

My BIOS does not have anything about BAR, but it does have a setting for TOLUD. I leave this setting at “dynamic”, because I read somewhere that choosing a value could cause problems booting.

Looking for your size value of those kind of lines:

Region 0: Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at c200 [size=128]

(My guess being: it the max size in thoses lines is 256M, then settings max-ram-below-4g to 3.5G should work, it max size is bigger than 256M, then settings max-ram-below-4g to 2G should work)

1 Like

My dGPU has three memory regions: 16M, 32G, 32M.

My iGPU has two: 16M, 256M

I’m not sure if that helps :slight_smile:

1 Like