Hybrid CPU: benchmarking performance when pinning to specific cores

solene · November 24, 2023, 11:41am

Related to hybrid CPUs with Performance/Efficient cores and this topic CPU Pinning Alder Lake explaining how to assign CPUs to a Qubes OS VM, I made a benchmark showing the performance difference between 4 performance / 4 efficient / 4 cores with no special pinning in Qubes OS

I added a Ryzen 5 5600X desktop CPU and a Laptop with an i5-7300U (Lenovo Thinkpad T470) to the benchmarks. The T470 CPU is only 2 cores with hyperthreading, so it has 4 threads at best, sharing with the single benchmark VM + dom0. While this laptop CPU competes with efficient cores, it has NO ROOM to run anything else. Both the other systems were running Qubes OS 4.2-RC4

solene · November 24, 2023, 11:56am

I need to measure power usage / battery life but it’s not automated My wattmeter will be at use here, but I need to think how to measure that properly.

balko · November 24, 2023, 2:38pm

Thank you. A bit too complicated for me, can you please present some sort of conclusion. Is manual pinning worth it or not? Did you managed to mitigate any issues when Qubes OS uses wrong weak CPU and it changes the performance of the qube drastically? Something about all that, if possible.

solene · November 24, 2023, 3:43pm

I will add two other CPU as a comparison and write a conclusion

solene · November 24, 2023, 5:00pm

The first conclusion (until I write something better, which may take a while) is that the efficient core are actually way faster than I expected, they are not slow at all, just slower than the performance cores.

The performance core draw more energy and will heat up the system more, for better performance.

Not doing any pinning gives good results, better than only using efficient cores but not as good as only performance cores.

I’d say doing nothing is good enough, Xen is doing well at balancing the workload across the cores.

If you have tasks such as heavy compilation, external GPU for gaming or anything that are really CPU dependant, pinning performance cores could be worth it.

If you want to improve battery life, using efficient cores may give better results, but I need to check the real benefit, but it must exists.

Restricting dom0 to efficient cores has few pros and cons:

Pros

this lets 100% of the performance cores available for qubes
it draws less battery / generate less heat

Cons

VM startup can be 0.5 to 1s slower (on an i7-1260P with 4.0s for VM startup with performance cores assigned to dom0)
Backup will be slower

solene · November 25, 2023, 2:38pm

I updated the link to add a comparison with a Ryzen 5 5600X (desktop CPU) and a i5-7300U (laptop CPU) to have an idea about the performance of efficient / performance cores

The performance cores of the i7-1260P competes with the Ryzen 5 5600X (which is a beefy CPU of last year), which was surprising. I didn’t enable SMT on both CPU.

The i5-7300U is just bad for daily use, it competes with efficient cores, but the i7-1260P has 8 efficient cores + 4 performance cores (that support SMT), while the i5-7300U is only 2 cores (I enabled SMT to have 4 threads), and they are quickly struggling under load…

tanky0u · February 11, 2024, 11:53am

Is Xen’s balancing here takes the P/E cores into account, or is Xen’s balancing here agnostic to that hardware’s techical capability?

solene · February 11, 2024, 12:34pm

I didn’t investigate much but given the results that are an average (and a high standard deviation) between the results of only E and only P cores, I don’t think Xen is aware of E/P cores.

renehoj · February 12, 2024, 7:18pm

As fare as I know, there isn’t any hypervisor that can do asymmetric scheduling.

The hybrid scheduler used by Linux, analyze the instructions being executed and use that to assign a weight to the process, and the weight determines which process gets to be executed on a P core. The same method can’t easily be applied to a VM, and some hypervisors just seem to favor using the fastest cores, and they are aware of the speed difference between the different cores.

Proxmox seems to load the cores in order of fast core first, P cores > E cores > logical half of the P core.

So 13900K with 32 vcpus would load in the order 0,2,4,6,8,10,12,14 / 16-31 / 1,3,5,7,9,11,13,15

I couldn’t find any documentation that explain how Xen priorities the cores, apart from its HT aware. Judging from looking at what cores are being using, it doesn’t seem like Xen is trying to max out the P cores before using the E core.

solene · February 12, 2024, 7:27pm

Xen is supposed to use this scheduling strategy by default Credit2 Scheduler - Xen

Maybe there is something to tweak to either add a weight to P cores for performance, or reduce their weight for more battery life

that would be pretty cool, much cooler than manually assigning the cores

renehoj · February 12, 2024, 7:40pm

You can set the soft affinity to the physical half of the P cores, it tells Xen it can use the full CPU but it should use the P cores if possible.

solene · February 12, 2024, 7:48pm

how do you do that? Is it a complicated setup?

renehoj · February 12, 2024, 7:53pm

It’s the same as pinning, you just use soft affinity instead of hard affinity.

https://wiki.xenproject.org/wiki/Tuning_Xen_for_Performance#vCPU_Soft_Affinity_for_guests