Windows 10 Experience on Qubes

Greetings, all.

(My original install/config issues thread is here: Windows 10 Questions - #46 by disp6252 )

I was here back in May working on getting my Windows 10 Qube working on my system and I was largely successful. I have been using it for a good 3 months solid now, and the only real issues I have been having are performance-related. Stability is very good to excellent.

First, let me describe what I had to end up doing.

My system specs:
Xeon E5-2697A 2.6GHz 26C/32T
X99 motherboard (MSI X99A GG)
128MB PC28800 DDR4 Memory
Samsung Pro 4TB SSD
AMD Radeon RX550 4GB
AMD Radeon RX6950XT 16GB
Built-in dual E2400 1GB NICs and 1535 WiFi/bluetooth adapter
Additional hard drives, both magnetic and SSD.
Realtek ALC1150+ESS ES9018K2M sound
X99 USB 2.0/3.0 & Asmedia ASM1042AE/1142 USB 3.1 Controllers

OS setup:
QubesOS 4.1
Windows 10 Pro 22H2

Qubes is configured to use all threads on the Xeon, which I know is not the most secure possible, due to the known core/thread bleedover issues with this and many other CPUs, but I am aiming to use the hardware I have to the fullest, and the potential attack surface is very small in my usage.

The Windows Qube has a custom config that is managed manually which I have to tweak due to the passthrough and performance settings. Here are the resources it is assigned:

  • 16 Threads (CPUs), with preference of CPUs 16-31. All other Qubes share CPUs 0-15.
  • 64GB of the 128GB of system memory.
  • The Radeon RX 6950XT is assigned directly via PCI passthrough to Windows. The RX550 is left for Qubes to run. There was some boot magic involved to get it to work because the Primary is the 6950XT, and I cannot find a way to change that.
  • The audio codecs are also passed through directly to Windows.
  • Originally I used sys-usb to pass the keyboard/mouse and some ports through to Windows, but this was very laggy, dropped/repeated keystrokes and caused mouse stuttering, so I opted to pass through the X99 chipset’s USB controllers to Windows, and tried to use an external KM switchbox to switch the K/M between the Windows passthrough ports and the Asmedia USB ports which Qubes still manages. Unfortunately, after 2 different KM switches had massive keystroke hangs and mouse stutter/freeze problems, among other things, I am left with moving cabled manually between ports. If anyone has a suggestion for a USB KM switch box that, you know, actually works that actually isn’t e-waste, please let me know. I have tried a Jancane and an Aimos, and they both suck. The Jancane was really bad, and the Aimos almost works, but not good enough.
  • The 4TB SSD is partitioned into two LVM volumes (C&D on windows), and most of the other drives in my system I give to Windows directly to manage, since they had existing data on them from the previous Win8 install I had.
  • I am using one of the E2400 ethernet ports through sys-net and sys-firewall using a VIF adapter to Windows. I am not presently using the other, or the wireless adapter, but it is configured in the sys-net qube.

Now, as far as stability is concerned, I have only had 3 hard locks/crashes of the Windows guest in 3 months, and they seem to have been due to problems with the Radeon driver. Since I last updated it, I have had no more issues. Thus, I think stability is very good to excellent. I have had no corruption issues, and no serious windows errors (outside of the occasional SysMain crashing, which is common; I should just disable it anyway). I have no app or game crash issues at all, unlike my previous Win8 install, which is why I had to upgrade finally. Was tired of the constant random app crashes with no explanation.

Now comes the real issue I would like some advice on: performance. While the system runs ok, and most apps/games run without too serious a problem, there are some which stutter and lag horribly. I originally did not have the CPU preferences set in Qubes, and after I set the preferences, it helped a bit. I also disabled CPU parking in Windows. However, I do notice a serious amount of issues related to the mouse movement causing stutters and lag in both just the Windows desktop and in games. Some games are impacted worse than others; some are almost smooth as glass, while others are barely playable with 500ms-2s stutters when there is significant mouse movement. I also get a lot of sound stutters watching YouTube or listening to MP3s, even without moving the mouse. Running LatencyMon shows the typical culprits of ndis.sys and dxgkrnl.sys, along with other parts of the device driver stack (Wdf01000.sys, HDAudBus.sys, etc). I also notice that just scrubbing the mouse around the screen with nothing running but the desktop, I get 7-10% CPU usage increase across most cores. The mouse is a Razer Naga 2014, and it is using just the driver, not the Synapse cloud bloatware crap. I’ve also set it at the lowest report rate of 125/s.

I installed the recent Windows Cumulative rollup update at the same time I did a Qubes update where it updated Xen. After restarting, the stuttering appears to have gotten significantly worse. I am expecting that the Windows update may have clobbered some of the registry performance tweaks I did, but it is also possible that the Qubes Xen update might have changed something for the worse along with it.

I also happened to recently run a benchmark suite from PassMark to see how well (or how badly) my system performance was compared to similar systems. To put it mildly, it was REALLY bad. While I expect some performance hit from running Windows as a Guest under a Linux-based Hypervisor, I expected it to be reasonable, like perhaps 10% at most. Based on the benchmark results, it is about 70-80% slower, which explains a few things.

Basically, at this point, I would like to involve the community here in a project to investigate and improve on Windows performance while running on Qubes. I plan on doing a lot more tweaking to try and get the performance up, because it is atrociously bad in my experience at the moment. I want to share what I’ve done and what I end up doing, but I would like to target getting 70-90+% of the performance of a native Windows 10 system. Given how much hardware I have passed through to Windows, which is completely dedicated to it, not managed by Qubes/Xen outside of the absolute minimum, I would expect better performance. Thus, I would think there is still plenty of room for tweaking and adjusting things to get to that performance goal.

I figure that a lot of it has to do with Windows. It doesn’t like not owning all the hardware, but I also expect that there are optimizations that can be made on Xen and qemu to improve things. It would seem also that some of my issues are driver/interrupt-related. ISR and DPC latency are regularly through the roof, especially when networking is involved. I expect that this may be due to the Xen PV drivers doing a lot of unnecessary blocking or other shenanigans, but there aren’t a lot of options with them. You either use them, or you don’t; they are largely black boxes config-wise.

As always, any advice or suggestions are welcome. I am sold on continuing to use Xen and Qubes as my HV setup. It largely works, and is dependable.

3 Likes

some questions: when you are using this Windows HVM, how many screen are you using ?
Do you use a dedicated screen ? Do you also see / use the virtual screen QubesOS is creating for you ?
Did you tried to disable it in the Windows display settings ?

Hey Neo. :slight_smile: I’ll answer these individually.

When are you using this Windows HVM?

All the time.

How many screens are you using?

2 at present connected to the 6950XT. One of them is connected to the 550 on a different display port.

Do you use a dedicated screen?

Yes and no. See above. Windows on the 6950XT uses both, and Qubes on the 550 is routed to the secondary input on the second screen.

Do you also see/use the virtual screen QubesOS creates for you?

No. That is why I have a custom config. I disable the Qubes virtual adapter in Windows. Windows never sees it because it doesn’t even exist in my VM setup. Originally, when I was getting things installed, I used it, but once passthrough was working, I disabled it in the config.

Thanks for the reply!

One thing I can sorta determine is that the bad stuttering I have now seems more related to disk access via the LVM volumes. If I have heavy disk access, the sound stutters real bad. I think the networking may also, but it’s a bit more difficult to test.

At this point, I am thinking it may be the Xen PV drivers that are really killing performance. It’s almost like the network drivers are blocking constantly, which makes ISRs and DPCs take far longer than they should. One interesting anomaly is that I have two sets of drives in my Windows explorer, except for the boot drive. They all appear under This PC, but all the additional drives (D, E, F, G) have duplicates that are at the same level as This PC. Either one works fine, but it just seems strange.

I think I might try to reinstall the Xen PV drivers again to see if Windows update might have clobbered them partly. The last time I reinstalled them seemed to help with the stuttering. Currently I have XenBUS, XenVBD, XenVIF, XenNET, and XenIFACE drivers installed… all are v8.2.2.1. I am wondering, has anyone had any luck with the 9.0.0 version?

Also, is there a possibility I am having ISR/DPC conflict issues, like if interrupts are shared, and mouse movement calls a driver chain which includes more than the mouse. I’m not sure even if I can or should be using MSI.

Benchmarks are useless. Install Linux + KVM on separate partiton, create similar VM and try. Then report your UX. If it is better, than it should be Qubes problem, if it is similar - your hadware.

My suggestion is to try Win11 light.
Not a single performance, or sound stuttering issue with only 2,2GB dedicated RAM.
Dedicated monitor (miniDP) and NVIDIA GPU to it, though.

Not sure I agree that benchmarks are “useless”. They are apps and they run in the same space as apps. Anything in the entire guest/host/bare metal stack that affects the performance of regular apps is going to affect them as well. Maybe you can expand a bit on what you mean by “useless”?

I have WPK installed and have run some WPA traces. Unfortunately, I’m not yet very good at reading them yet.

I appreciate the suggestion about trying a different VM stack, but I am not sure how it will help solve the performance issues. Also, I am not experienced with setting up KVM with hardware passthrough for a Windows guest, and don’t have an available bootable drive partition to load it at the moment.

I don’t think it is hardware-related, since Windows Updates and driver upgrades/re-installs seem to make it better/worse. Seems more like its in the “configuration problems” space.

Thank you for the reply!

Hmm… I don’t think this system will run Win11, as it doesn’t have a TPM module. Maybe I can try and run Win10 LTSC or a fully-ameliorated Win10 Qube and see. That might be an interesting comparison.

I’d still like to drill down into the current install and see what is causing the stuttering. Likely, it and a combination of other config issues are affecting performance.

Thanks for the suggestion!

Here’s my LatencyMon report:

_________________________________________________________________________________________________________
CONCLUSION
_________________________________________________________________________________________________________
Your system appears to be having trouble handling real-time audio and other tasks. You are likely to experience buffer underruns appearing as drop outs, clicks or pops. One or more DPC routines that belong to a driver running in your system appear to be executing for too long. Also one or more ISR routines that belong to a driver running in your system appear to be executing for too long. At least one detected problem appears to be network related. In case you are using a WLAN adapter, try disabling it to get better results. One problem may be related to power management, disable CPU throttling settings in Control Panel and BIOS setup. Check for BIOS updates. 
LatencyMon has been analyzing your system for  0:37:46  (h:mm:ss) on all processors.


_________________________________________________________________________________________________________
SYSTEM INFORMATION
_________________________________________________________________________________________________________
Computer name:                                        ZHIAL
OS version:                                           Windows 10, 10.0, version 2009, build: 19045 (x64)
Hardware:                                             HVM domU, Xen
BIOS:                                                 Default System BIOS
CPU:                                                  GenuineIntel Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz
Logical processors:                                   16
Processor groups:                                     1
Processor group size:                                 16
RAM:                                                  65535 MB total


_________________________________________________________________________________________________________
CPU SPEED
_________________________________________________________________________________________________________
Reported CPU speed (WMI):                             260 MHz
Reported CPU speed (registry):                        260 MHz

Note: reported execution times may be calculated based on a fixed reported CPU speed. Disable variable speed settings like Intel Speed Step and AMD Cool N Quiet in the BIOS setup for more accurate results.


_________________________________________________________________________________________________________
MEASURED INTERRUPT TO USER PROCESS LATENCIES
_________________________________________________________________________________________________________
The interrupt to process latency reflects the measured interval that a usermode process needed to respond to a hardware request from the moment the interrupt service routine started execution. This includes the scheduling and execution of a DPC routine, the signaling of an event and the waking up of a usermode thread from an idle wait state in response to that event.

Highest measured interrupt to process latency (µs):   74891.2480
Average measured interrupt to process latency (µs):   402.575711

Highest measured interrupt to DPC latency (µs):       65912.8640
Average measured interrupt to DPC latency (µs):       268.449764


_________________________________________________________________________________________________________
 REPORTED ISRs
_________________________________________________________________________________________________________
Interrupt service routines are routines installed by the OS and device drivers that execute in response to a hardware interrupt signal.

Highest ISR routine execution time (µs):              35643.410385
Driver with highest ISR routine execution time:       ndis.sys - Network Driver Interface Specification (NDIS), Microsoft Corporation

Highest reported total ISR routine time (%):          0.031894
Driver with highest ISR total time:                   ndis.sys - Network Driver Interface Specification (NDIS), Microsoft Corporation

Total time spent in ISRs (%)                          0.035914

ISR count (execution time <250 µs):                   252854
ISR count (execution time 250-500 µs):                0
ISR count (execution time 500-1000 µs):               5921
ISR count (execution time 1000-2000 µs):              732
ISR count (execution time 2000-4000 µs):              611
ISR count (execution time >=4000 µs):                 0


_________________________________________________________________________________________________________
REPORTED DPCs
_________________________________________________________________________________________________________
DPC routines are part of the interrupt servicing dispatch mechanism and disable the possibility for a process to utilize the CPU while it is interrupted until the DPC has finished execution.

Highest DPC routine execution time (µs):              107177.540385
Driver with highest DPC routine execution time:       ndis.sys - Network Driver Interface Specification (NDIS), Microsoft Corporation

Highest reported total DPC routine time (%):          0.608576
Driver with highest DPC total execution time:         dxgkrnl.sys - DirectX Graphics Kernel, Microsoft Corporation

Total time spent in DPCs (%)                          0.736307

DPC count (execution time <250 µs):                   1267137
DPC count (execution time 250-500 µs):                0
DPC count (execution time 500-10000 µs):              71659
DPC count (execution time 1000-2000 µs):              11501
DPC count (execution time 2000-4000 µs):              10132
DPC count (execution time >=4000 µs):                 19918


_________________________________________________________________________________________________________
 REPORTED HARD PAGEFAULTS
_________________________________________________________________________________________________________
Hard pagefaults are events that get triggered by making use of virtual memory that is not resident in RAM but backed by a memory mapped file on disk. The process of resolving the hard pagefault requires reading in the memory from disk while the process is interrupted and blocked from execution.

NOTE: some processes were hit by hard pagefaults. If these were programs producing audio, they are likely to interrupt the audio stream resulting in dropouts, clicks and pops. Check the Processes tab to see which programs were hit.

Process with highest pagefault count:                 discord.exe

Total number of hard pagefaults                       631
Hard pagefault count of hardest hit process:          154
Number of processes hit:                              21


_________________________________________________________________________________________________________
 PER CPU DATA
_________________________________________________________________________________________________________
CPU 0 Interrupt cycle time (s):                       1239.983886
CPU 0 ISR highest execution time (µs):                7105.618077
CPU 0 ISR total execution time (s):                   0.098880
CPU 0 ISR count:                                      20811
CPU 0 DPC highest execution time (µs):                28260.906538
CPU 0 DPC total execution time (s):                   10.537725
CPU 0 DPC count:                                      434593
_________________________________________________________________________________________________________
CPU 1 Interrupt cycle time (s):                       864.082864
CPU 1 ISR highest execution time (µs):                58.526154
CPU 1 ISR total execution time (s):                   0.001413
CPU 1 ISR count:                                      282
CPU 1 DPC highest execution time (µs):                18095.153077
CPU 1 DPC total execution time (s):                   2.188201
CPU 1 DPC count:                                      8968
_________________________________________________________________________________________________________
CPU 2 Interrupt cycle time (s):                       890.788230
CPU 2 ISR highest execution time (µs):                65.188846
CPU 2 ISR total execution time (s):                   0.001176
CPU 2 ISR count:                                      225
CPU 2 DPC highest execution time (µs):                18214.259615
CPU 2 DPC total execution time (s):                   1.947822
CPU 2 DPC count:                                      9301
_________________________________________________________________________________________________________
CPU 3 Interrupt cycle time (s):                       918.026269
CPU 3 ISR highest execution time (µs):                336.969231
CPU 3 ISR total execution time (s):                   0.000890
CPU 3 ISR count:                                      119
CPU 3 DPC highest execution time (µs):                30147.3750
CPU 3 DPC total execution time (s):                   0.904180
CPU 3 DPC count:                                      5873
_________________________________________________________________________________________________________
CPU 4 Interrupt cycle time (s):                       1127.023782
CPU 4 ISR highest execution time (µs):                1882.058077
CPU 4 ISR total execution time (s):                   0.005959
CPU 4 ISR count:                                      588
CPU 4 DPC highest execution time (µs):                84063.063462
CPU 4 DPC total execution time (s):                   111.566579
CPU 4 DPC count:                                      353673
_________________________________________________________________________________________________________
CPU 5 Interrupt cycle time (s):                       1026.077917
CPU 5 ISR highest execution time (µs):                217.866538
CPU 5 ISR total execution time (s):                   0.001608
CPU 5 ISR count:                                      281
CPU 5 DPC highest execution time (µs):                69129.101923
CPU 5 DPC total execution time (s):                   44.772431
CPU 5 DPC count:                                      113462
_________________________________________________________________________________________________________
CPU 6 Interrupt cycle time (s):                       977.091266
CPU 6 ISR highest execution time (µs):                12.172692
CPU 6 ISR total execution time (s):                   0.001188
CPU 6 ISR count:                                      257
CPU 6 DPC highest execution time (µs):                80758.288077
CPU 6 DPC total execution time (s):                   43.610968
CPU 6 DPC count:                                      115878
_________________________________________________________________________________________________________
CPU 7 Interrupt cycle time (s):                       973.327790
CPU 7 ISR highest execution time (µs):                11.534615
CPU 7 ISR total execution time (s):                   0.000642
CPU 7 ISR count:                                      140
CPU 7 DPC highest execution time (µs):                58264.848462
CPU 7 DPC total execution time (s):                   22.777051
CPU 7 DPC count:                                      59789
_________________________________________________________________________________________________________
CPU 8 Interrupt cycle time (s):                       970.922706
CPU 8 ISR highest execution time (µs):                35643.410385
CPU 8 ISR total execution time (s):                   6.634288
CPU 8 ISR count:                                      120397
CPU 8 DPC highest execution time (µs):                107177.540385
CPU 8 DPC total execution time (s):                   15.457570
CPU 8 DPC count:                                      128099
_________________________________________________________________________________________________________
CPU 9 Interrupt cycle time (s):                       957.613330
CPU 9 ISR highest execution time (µs):                18271.138462
CPU 9 ISR total execution time (s):                   2.901551
CPU 9 ISR count:                                      34083
CPU 9 DPC highest execution time (µs):                82065.124615
CPU 9 DPC total execution time (s):                   6.178766
CPU 9 DPC count:                                      38540
_________________________________________________________________________________________________________
CPU 10 Interrupt cycle time (s):                       937.513974
CPU 10 ISR highest execution time (µs):                21941.421538
CPU 10 ISR total execution time (s):                   1.264898
CPU 10 ISR count:                                      15953
CPU 10 DPC highest execution time (µs):                103351.703077
CPU 10 DPC total execution time (s):                   3.424875
CPU 10 DPC count:                                      19955
_________________________________________________________________________________________________________
CPU 11 Interrupt cycle time (s):                       940.971507
CPU 11 ISR highest execution time (µs):                24302.166923
CPU 11 ISR total execution time (s):                   1.095959
CPU 11 ISR count:                                      12797
CPU 11 DPC highest execution time (µs):                67853.907692
CPU 11 DPC total execution time (s):                   2.292104
CPU 11 DPC count:                                      16678
_________________________________________________________________________________________________________
CPU 12 Interrupt cycle time (s):                       931.822414
CPU 12 ISR highest execution time (µs):                8344.321154
CPU 12 ISR total execution time (s):                   0.763895
CPU 12 ISR count:                                      39910
CPU 12 DPC highest execution time (µs):                15761.891538
CPU 12 DPC total execution time (s):                   0.685626
CPU 12 DPC count:                                      45142
_________________________________________________________________________________________________________
CPU 13 Interrupt cycle time (s):                       922.556372
CPU 13 ISR highest execution time (µs):                14156.787692
CPU 13 ISR total execution time (s):                   0.137276
CPU 13 ISR count:                                      6968
CPU 13 DPC highest execution time (µs):                10891.920385
CPU 13 DPC total execution time (s):                   0.235697
CPU 13 DPC count:                                      13001
_________________________________________________________________________________________________________
CPU 14 Interrupt cycle time (s):                       927.678242
CPU 14 ISR highest execution time (µs):                10694.590769
CPU 14 ISR total execution time (s):                   0.071538
CPU 14 ISR count:                                      4762
CPU 14 DPC highest execution time (µs):                15702.773462
CPU 14 DPC total execution time (s):                   0.285731
CPU 14 DPC count:                                      9691
_________________________________________________________________________________________________________
CPU 15 Interrupt cycle time (s):                       928.221661
CPU 15 ISR highest execution time (µs):                2084.097308
CPU 15 ISR total execution time (s):                   0.044346
CPU 15 ISR count:                                      3395
CPU 15 DPC highest execution time (µs):                6601.928077
CPU 15 DPC total execution time (s):                   0.182063
CPU 15 DPC count:                                      7704
_________________________________________________________________________________________________________

The last one from me (I’m using Qubes for more than 4 years as a daily driver, and ten years in total): I am not sure to be interested in, positively driven, and consequently to expect solutions is enough, if no expected response is gotten. This usually means there are alternatives easier to set, or self-contribution is expected.

Apologies for the late response. I was in the hospital.

I am not sure what you mean, exactly. I have been working hard on getting this Qubes/Windows setup to work for months now. You can see from my other threads the effort and progress I have made.

There’s no “:one way” to diagnose and fix problems, and I have to negotiation certain limitations that my current setup has. I don’t doubt that every suggestion may yield some insight to the performance issues, and I am willing to try the ones I can. This is a production system, so I don’t have the options to do certain kinds of reinstalls, and I also have specific requirements for the OS and config I picked. For example, even if I could get a Windows 11 Light install working, and could see that the performance is arguably better, I currently could not upgrade to that as my guest OS.

One additional thing I discovered this week while beginning my convalescence was that one of the Qubes upgrades put “smt=off” back in my grub bootconfig. Once I turned it back on, I got back to the same performance level I had before.

What that tells me is that one of the primary issues I am having is that the Qubes/Xen scheduler seems to be a major bottleneck in performance. If the Windows guest shares pCPUs with dom0 and the other Qubes, the performance REALLY drops, even if it is only some processors being shared.

I did turn off Core Parking in Windows 10 to help with it, but it didn’t seem to change much overall.

At this point, what I think I am looking at is related to ISR/DPC blocking in the PV Drivers, and perhaps even some more going on within the video drivers. Mouse movement should NOT cause this level of delay in interrupt handling. 125 ISR calls per sec into the driver layer should not cause such a huge spike in system CPU usage, or so I would think.

Right now, my plan is to do some more WPM/WPA traces and see if I can get better at interpreting at what layer these latencies are occurring, and then decide what to do about it from there.

Anyway, I hope I understood your message and concerns. If not, I apologize. Feel free to correct my misconceptions. Thank you again for taking the time to respond!