Poor GPU performance in Qube

I’ve created a Windows 10 HVM with my GPU passed through.
When playing games that are GPU heavy, I noticed a huge drop off in performance compared to bare metal and KVM passthrough setups I’ve tried previously on this hardware.

When benchmarking using userbenchmark.com (not the most reliable, but easy) I see a huge issue. While my CPU has dropped off a reasonable 5-6% in performance, my GPU has experienced a massive drop off. Below I’ve attached a picture of my results:

I really notice this drop off in 3D apps and games. I would usually pass it off, but it seems out of the ordinary. Is this sort of performance expected when passing through GPU devices using Qubes? Or am I possibly doing something wrong.

I also notice very poor and stuttery USB mouse performance on my passed through PCI USB controller under high load, or doing weirdly specific things like moving my mouse over elements with animated JS on Firefox, which spikes my GPU to 100%. It seems strange.

I’ve also created a Pop! OS Qube and passed through my GPU, and my results were more of the same.

4 Likes

For your USB situation, I would either buy a PCI USB Controller to pass through, or if your board is able you can check sys-usb’s PCI devices and see how many there are and experiment with what corresponds with which ports. Easiest way to do this for me was to start Qubes with qubes.skip_autostart then I would try one at a time attaching the devices that are attached to sys-usb to the HVM and figure out what set of ports it takes up.

If it seems workable (enough ports for Host/Guest) pass it through to your VM and remove it from sys-usb. I have a powered USB hub with a physical USB switch, then a second unpowered switch just for my USB soundbar so I can keep the audio on the host and switch it only when necessary.


For the rest, I don’t have much help to offer but I am experiencing similar. I have tried passing through my NVMe hoping that it was just the poor I/O with the Xen drivers (though I have not tried installing the host onto this drive yet, just mounting games via it).

CPU Pinning didn’t help (I’m on a 5800X so apparently it isn’t necessary), tried different clock timers (hpet disabled, tsc enabled/native). Tried vcpupin, vcpusched and emulatorpin both with smt enabled and disabled with no real difference. I pinned 0-1 to dom0 and the rest to the HVM and did notice it only loads down the cores I have pinned so seems like it’s working decently though I have not done core isolation.

Performance in-game for me is roughly half, Cinebench I got the same scores as metal for CPU for Single and Multi Core but my GPU score did drop by about 3000 points. Same framerate on the Linux distros, not noticing the CPU maxing out or even coming close to it and watch -n 1 xenpm get-cpufreq-average does show the cores boosting to ~4.6GHz.

On the host CPU Governor I have set to Performance as well, High Performance power plan in Windows. No virtualization or anything just a bare Win10 with Steam and Cyberpunk and Starfield. Both VMs only show the CPU Frequency as 3.7GHz but from what I can tell this is normal and the cores do actually appear to be boosting.

2 Likes

Yeah, seems like we are both experiencing a huge drop off in GPU performance.

What I noticed is on my PCIe USB Controller is that when the GPU is under high load, for example when I uncap my FPS in a game or run a very intensive game, the USB mouse plugged into the controller is super stuttery. (I’ve tested this on both Linux AND Windows HVMs) There must be some sort of bandwidth issue on PCIe devices preventing them from reaching full performance, at least that’s my theory. Hopefully someone who is more knowledgable about the Qubes architecture could explain this, but it’s just my high-level understanding of what’s going on.

You will notice that CPU heavy games run great, but GPU heavy games are destroyed in performance. One game in particular I get 200-400fps on bare metal, but on Qubes I get 30-80.
However on a more CPU bound game I play there’s a very small barely noticeable drop off.

I’ve tried bench marking with different qvm-pci options like no-strict-reset and permissive, but nothing seems to change.
I hope this is something that’s able to be resolved with an easy fix.

2 Likes

Interesting, I am passing through just an existing controller as my board has 3 so I just robbed one from sys-usb and am using a powered USB Hub and switch to switch between 2 ports on the computer and I have not noticed any drop out. The only USB issues I had was actually when using Qubes to attach the USB devices to the VM, then they were super jittery in game (especially the Division 2 on Win10 with the game installed on the Block device instead of NVMe).

On my Nobara HVM I ran sudo lspci -vv and see LnkSta: Speed 16GT/s, Width x16 for my GPU while running the Cyberpunk benchmark, and LnkSta: Speed 2.5GT/s (downgraded), Width x16 when idling at the desktop which should be functioning normally, maybe try running that and seeing if you can see anything going on with your USB Controller, not sure what other commands might help with that though.

Interestingly enough, I did finally hit 99% GPU utilization in one game. Starfield on my Windows 10 HVM does actually give me the same exact performance as bare metal and the GPU maxes out on it for a majority of the playtime. I would love to test it out on Linux but the Nvidia 535 drivers are currently preventing the game from loading and I didn’t have luck with 525. If you’re able to this might be a game to test just to see if you can get the full GPU utilization under at least one game. I used GPU-Z’s sensors tab to keep an eye on the GPU usage.

It does feel like something is limiting certain games but difficult to say what, it doesn’t seem CPU bound from what I have tried so far and Starfield hitting 99% makes me think it’s not a power or bus lane issue, but I have no idea what else could be causing it.

I have also tested multifunction=on on the PCI device by editing the pci.xml and adding it as a flag, but did not notice any difference in performance with that (verified it is adding by viewing my generated VMs XML as well).

1 Like

Did a little more digging with nvtop and htop with the game running. You can see it’s reporting as PCIe Gen 4@16x, unsure on the RX and TX rates on that I never used Linux baremetal for gaming. I don’t see any glaring issues, I really can’t understand where the bottleneck is.

Searching around I’ve come across so many with a similar issue but none of the fixes so far have helped or were even applicable, I’ve been at this for a few weeks now on and off would love to be able to ditch the Windows dual boot once and for all.

Cyberpunk 2077 (All high with Medium crowd density, RTX on Medium), 40FPS in Benchmark on HVM, 90 on Windows 10 baremetal.

Stray (settings maxed), getting some frame drops which never occurred in Windows but is running in the HVM from 50-110FPS, still looks like something is holding it back.

4 Likes

I’ve spent the entire day trying to figure out what could cause this. I’ve tried UEFI, qemu-traditional, tweaking my VMs XML, any many other things. 100 benchmarks later, I think it may just be the Xen architecture holding the GPU back and not a Qubes specific issue. I really have no idea what the problem is, but I know for sure that KVM works completely fine with VFIO GPU passthrough. And because of that, I’m going to jump ship to a normal distro and just use KVMs. :saluting_face: Good luck to you or anyone else trying to tackle this.

2 Likes

Is anyone having this issue on an AMD card? it might be worth it to look into opening an issue on the github if multiple people are having this problem

2 Likes

Report back how it works! I’m really trying to get rid of my Dual boot but if I absolutely have to, I’d much rather do VFIO with a Linux host.

1 Like

Quick update: performance in the HVM is the same all the way down to 2 cores (2 cores with 4 threads is still the same framerate, 2 cores 0 threads finally begins to dip under 30).

I turned SMT off in my bios, as well as core boosting and went back to Windows 10 baremetal and reran the benchmark. 80-100 FPS for Cyberpunk and the CPU is pinned at 3.8GHz so can rule out boosting being an issue, seemed like it was boosting fine with the performance governor set anyways when watching with watch -n 1 xenpm get-cpufreq-average.

Baremetal

Gaming HVM, interesting to note here is that GPU-Z is reporting an Idle PerfCap Reason during the benchmark, and is VRel (blue) for the rest of it, very different appearance from the baremetal graph.

CPU Usage does look a little lower in the HVM during the benchmark as well, wonder if there’s some sort of memory bottleneck? Maybe will need to try hugepages.

HVM Again but with settings maxed out this time, PerfCap Reason starting to look a little more like bare metal but still a ways to go

Not sure where else to go with this at this point, suggestions are greatly appreciated! Things I have done off the top of my head

  • Thread pinning
  • 1 and 2 Iothreads, as well as iothreadpins
  • emulatorpin set
  • vcpusched for each pinned vpu with scheduler=“rr” and priority=“1”
  • Defining topology both with and without SMT enabled.
  • Pinning dom0 to vcpu 0 and 1 and turning off every other VM including sys qubes.
  • Setting clock source to also have timers: hpet present=no; tsc present=yes mode=native. (disabled after no change in performance)
  • Adding <feature name='hypervisor' policy='disable'/> , Note that this did NOT decrease my performance with the flag set (checked that it was no longer reporting as a VM in Task Manager as well). (removed after no change noticed)
  • Adding multifunction=on under <address on the GPU
  • Windows 10 Pro, Nobara, and Garuda Linux all tested with similar results

Here is a link to my last tested XML for the Win10 HVM note that my GPU is 0x0c

2 Likes

Just got a MASSIVE boost in my Windows HVM after messing with xen.xml, getting native frames in Cyberpunk, Starfield, and Hogwarts Legacy now and seeing much higher GPU usage. Here is my current XML.

Going to do a little more testing to try to figure out exactly which option gave me the boost and if it also works under the Linux guests. I was honestly just shotgunning other peoples VFIO options and got lucky I think. It is odd to me that it performs well with <feature policy='disable' name='hypervisor'/> I had thought setting that flag would kill your performance.


Slight exaggeration, not quite bare metal but I am getting 70FPS on the benchmark vs 80FPS on metal compared to the 35-40 before.

Think I got the settings somewhat nailed down for WINDOWS ONLY hopefully it works on yours if you still have Qubes. The following needed to be added to /usr/share/qubes/templates/libvirt/xen.xml (or however you manage your XMLs, virsh edit did not work for me).

    {% if vm.virt_mode != 'pv' %}
        <cpu mode='host-passthrough'>
         
            <!-- Add this block -->
            {% if vm.features.get('gaming_feature_policies', '0') == '1' -%}
                <feature policy="require" name="invtsc"/>
                <feature policy="disable" name="hypervisor"/>
            {% endif -%}

After doing that, you can run qvm-features gpu-gaming-win10 gaming_feature_policies 1. Then when you start your HVM, check /etc/libvirt/libxl/gpu-gaming-win10.xml and make sure you see those options added to your HVMs XML.

<feature policy="disable" name="hypervisor"/> is actually KEY on my system. The hypervisor feature on its own sees no gain, and not having it at all doesn’t change anything, but the combination of the two works. Apparently invtsc and hypervisor together will help negate the performance impact from using the hypervisor flag. I am curious why I get abysmal performance both with and without the flag though.

I have tried this under Nobara Linux (KDE + Wayland) but didn’t see an improvement in Cyberpunk or Stray via Lutris with Proton GE, however, I do not know how well it runs under Lutris on bare metal so I might try Nobara on metal when I get my NVMe that was supposed to replace my dual boot one then start attacking that.

One thing that I will reiterate for anyone coming across this is that you definitely want to pass through an NVMe to keep your games on, the Disk I/O for Xen and Windows 10 is only slightly better than a 7200RPM drive when running the host on an NVMe and will cause massive stuttering in certain games.


Update: all of my games that I had on Win10 have been set up and are running well. The only games that I had to do anything with were ones that utilize Shader Precaching (Hogwarts Legacy, Uncharted 4) as the I/O on the Xen disk just isn’t good enough. Solution for me was to just Symlink a folder on my NVMe to ProgramData (e.x. mklink /J "C:\ProgramData\Hogwarts Legacy" "D:\Games\Symlinks\ProgramData\Hogwarts Legacy").

After symlinking to the NVMe they run very similar to metal if not the exact same. Without symlinking Uncharted had lots of stutter and audio stutter all the through the loading and into the game to where it was unplayable.

Better/more permanent solution is to install Windows onto the NVMe rather than using the Qubes provided disks, but I’m keeping it set up to be able to dual boot as a fallback for a little so will likely do that down the line.

Games I tested:

  • System Shock Remake
  • Cyberpunk 2077
  • Hogwarts Legacy
  • Left 4 Dead 2
  • Lethal Company
  • Outer Wilds
  • Starfield
  • Starship Troopers - Terran Command
  • Stray
  • Tales from the Borderlands
  • Uncharted - Legacy of Thieves Collection
  • Warhammer 40,000 Boltgun
4 Likes

I added these tweaks to my XML and confirmed in Task Manager that it was disabled, however I am still getting the same benchmarks on my GPU, which is under half. :frowning:
However, it did increase my CPU performance a bit.

1 Like

What are your specs? Sorry tried looking but didn’t notice cpu might have missed it though.

Can you provide your current XML? I provided the path to mine above should be similar for yours.

I haven’t tried your benchmark yet, do you have any games that are performing poorly or just the benchmark?

In GPU-Z do you see the PCI lane switch to Gen3/4 @16x under load?

Do you have dom0 cores pinned via grub? I pinned mine to 2 I haven’t tried undoing it yet. I can get that for you when I’m back home if you need it was just 2 arguments I had to add then build both grubs.

Do you have/need to pin your HVMs cores? I didn’t notice a difference but the 5800X architecture is supposed to not need it. 5900 and 5950 and their respective 7000 series I would imagine do still need it, lstopo should give you an idea of if they are split up and which cores you need to use but you need to check that before pinning dom0s cores as it will only show you the pinned ones after.

Do you have all other Qubes shut down? I have dom0 and all my sys ones running but not much else.

CPU governor set to performance and verified that it’s boosting in Qubes?

Above 4G decoding enabled in bios?

You could try some of the other options that appear in my XML pastebin, I just worked back until I found what worked for my system but it could be an entirely different option on yours. I don’t know if the hyperv ones actually do anything on Qubes.

Sorry for the long response just trying to rattle off anything I can think of!

2 Likes

Nevermind, that crappy benchmark software is a false alarm. I am getting much better performance in games. The GPU bound game I mentioned earlier went from 80fps → 120fps. Awesome job!

I’m curious as to why it doesn’t work on Linux. I’m going to look into extra tweaks as well and see if anything can squeeze out any more performance. I will update this post with anything I find.

As for this, I tried adding an iothread with the following:

<vcpu placement="static">8</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <!-- cpu pinning here -->
     <vcpupin vcpu="X" cpuset="X"/>
     <vcpupin vcpu="X" cpuset="X"/>
     <vcpupin vcpu="X" cpuset="X"/>

    <iothreadpin iothread="1" cpuset="0-15"/>
  </cputune>

And I feel like the disk performance has increased, but I haven’t tried running any benchmarks yet.
EDIT: Nevermind, iothreads are completely useless for phy disks as used by Qubes

2 Likes

Awesome to hear!

Look forward to your findings on the Linux part, I would love to be able to move away from Windows as much as possible. Will definitely try the io threads and see if Uncharted keeps stuttering or if it resolves, I had only tried it with cpuset 0,1 never the full allotment, thanks!

1 Like

It’s worth mentioning that I fixed this by disabling MSI mode on my USB controllers using this tool.

All that’s needed is to run it as admin and click the checkbox on all USB hubs and then Apply. After a restart, the issue will be fixed.

1 Like

Thanks to both of you for all the work you’ve done on this!

Did anyone happen to figure out how to fix the issue on Linux? Qubes Windows Tools is currently unrecommend due to security issues so it’d be nice to get a fix for Linux as well.

I’m playing Granblue Fantasy Versus Rising and Samurai Shodown and I’m seeing the same issue reported here, dropping FPS while the GPU doesn’t exceed 38% utilization. Only 10% of the RAM (24GB) and and about 15% of the CPU (6 VCPUs on an i7-13700k, no CPU pinning set up yet) is being used, so neither of those are bottlenecking.

Not sure if it’s helpful, but I’ve noticed there’s only a small difference in FPS from changing graphics settings. Going from Low to Highest and changing the resolution from highest to lowest barely makes a difference, which is weird. It’s like there’s some bottleneck, but since everything is being underutilized I’m not sure what’s going on.

1 Like

I have not looked into Linux further at this point but I have experienced exactly what you’re describing on Windows (prior to patching xen.xml) and Linux.

My gaming VM is currently down as upgrading to 4.2 has prevented my 3.5G patch from working so I can’t boot with more than ~2 gigs of memory so I’ll need to take some time to get that working again first as re-rerunning my usual stubdom patch script does apply it but does not allow it to boot.

It is odd that that parameter improves performance, as using invtsc require along with hypervisor disable is supposed to allow you to hide the VM without losing as much performance as without having them at all, but in our case it’s required to get any real performance.

What CPU do you have? I’m curious if this affects both Intel and AMD or if it’s just one, mine is a Ryzen 5800X.

It does feel like an issue with how it’s virtualizing the CPU as like you said, there is basically no indicator of a real bottleneck and we confirmed that for Windows at least. Hopefully the same is true for Linux we just need to figure out what is different there

1 Like

I’ve got an Intel 13700k, so it seems to be happening to both Intel and AMD.

It’s definitely something with the virtualization, because I tested those same games I mentioned earlier on bare metal Fedora and I’m getting great performance on maxed out settings with my 4060.

1 Like

Create a Gaming HVM

1 Like

No luck with that unfortunately, seems like a number of users are having issues with it on 4.2, going to try to downgrade Xen and apply the patch to an older version in a bit if nothing else comes up.

1 Like