So, I got curious and did a few time measurements with performance/ondemand as well as various dom0 cpu pinning configurations (I was for instance wondering what was the rationale/impact of pinning dom0 to E-cores).
tl;dr; ‘strict’ 1:1 dom0 cpu pinning to E-cores with the performance scheduler on P-cores gives the most consistent, lowest starting times, representing a 20% performance increase over using the ondemand scheduler with no dom0 pinning.
Time in seconds for qvm-start
to complete; 20 iterations, 6 vcpus assigned to dom0, on a i13600k (6 Pcores, 8 Ecores):
median | min | max | mean | pstdev | |
---|---|---|---|---|---|
ondemand / no dom0 cpu pinning (0-13) | 9.5 | 7.9 | 10.0 | 9.2 | 0.7 |
ondemand / dom0 pinned on all E cores (6-13) | 9.5 | 7.6 | 10.0 | 9.0 | 0.9 |
ondemand / dom0 1:1 E core pinning (6->11) * | 7.9 | 7.6 | 9.9 | 8.1 | 0.7 |
ondemand / dom0 1:1 P core pinning (0->5) | 11.6 | 11.2 | 11.7 | 11.6 | 0.2 |
performance / no dom0 cpu pinning (0-13) | 9.2 | 7.6 | 9.9 | 8.9 | 0.9 |
performance / dom0 1:1 E core pinning (6->11) * | 7.6 | 7.3 | 7.9 | 7.6 | 0.2 |
performance / dom0 1:1 P core pinning (0->5) | 9.0 | 8.7 | 9.2 | 9.0 | 0.2 |
* ‘1:1 E core pinning’ means pinning dom0 vcpu 0 to physical core 6 (E core #0), vcpu 1 to core 7 (E core #1), and so on to avoid dynamic reshuffling
Findings:
- better times with dom0 pinned to E cores.
- As expected, ‘performance’ fares better than ‘ondemand’ (for that specific load) but the difference is minimal.
- ‘1:1’ dom0 CPU pinning was always better than dynamic pinning, with lower and more consistent (stdev) load times, likely because of L1/L2 cache hits of all the vm management stuff (qubesd, libvirt, …)
- ‘performance’ together with dom0 ‘1:1’ CPU E-core pinning exhibited the lowest and most consistent startup times.
Obviously the above might not be true when running heavy concurrent workloads (in that case it would be interesting to see how to tweak Xen not to reshuffle CPUs too aggressively). In my case the PC is idling most of the time and starting VMs the fastest possible is important.
[edit - added dom0 1:1 P core results]