CPU Pinning Alder Lake

How did you switch to credit? Just by passing sched=credit on the Xen cmdline?

Also, to anyone interested, I’ve created a script to dynamically switch CPU pins based on the currently focused window :).

Forum thread: Dynamic, window focus-based CPU pinning

2 Likes

I think I’ve just uncovered a rather nasty bug. If the vcpu count in dom0 is set to anything other than the exact number of physical cores, the xen_acpi_processor kernel driver will fail to upload the C-state information for those cores to Xen, resulting in Xen never knowing about the C-states, which significantly impacts battery life.

You can check if you’re impacted by this by running xenpm start 1. If there are any CPUs that only display C0 and C1, then Xen doesn’t know about the additional C-states. The fix is to set the Xen boot parameter dom0_max_vcpus to your exact number of physical cores, including hyperthreading cores even if they’re disabled in software.

4 Likes

Also: this prevents s0ix sleep from working correctly.

Have you reported this upstream to Xen?

Also, do you know whether this impacts performance at all/in dom0 only/of all AppVMs?

Yes, I have reported the issue to the xen-devel mailing list.

The issue does not affect performance, but it impacts efficiency as it prevents Xen from throttling down the CPU. It shortens battery life in general and in s0ix sleep by an order of magnitude.

1 Like

It looks like not every cpu is affected.

I have alder lake i9-12900K and there even though I do like the first post instruction says - only have max dom0 vcpus at 4, xenpm sleep 1 still shows me C0 … C4 on all cores

Now with my mobile i7-13700H, when I do max dom0 vcpus at 4, then only first 4 show C0 … C3, and the rest of them only C0 and C1 just as you say (I am not sure what to think about lack of C4, was that removed in newer revisions?)

So I guess allowing max vcpus = 20 and then running

xl vcpu-set Domain-0 4
xl vcpu-pin Domain-0 all 16-19

would result in pretty much the desired behavior (and I confirmed that all CPUs show C0…C3 states)

Does it good (possible?) idea to reduce Domain-0 (dom0) from 4vcpu to 2vcpu and dedicate to it 2 Low Power Efficient-cores?

I don’t think that’s a good idea. qubesd in dom0 uses quite a bit of CPU for managing (especially starting) domains and you’d see significant slowdowns.

Also GUI could degrade.

1 Like

I have two thoughts behind such move:

  1. Split between dom0 and another qubes physical cores as hardening against Meltdown and Spectre class vulnerabilities, in case new vulnerability in that class appear in future.

  2. 4 vCPU (4 cores) at modern CPU (AMD gen7-9, Intel gen12-15) provide much of wasted CPU time for dom0, also those 4vCPU are shared with another qubes.
    If 2LPE dedicated cores not enough for dom0, maybe 1E and 2LPE is.

4 vCPU (4 cores) at modern CPU (AMD gen7-9, Intel gen12-15) provide much of wasted CPU time for dom0

While 4 vCPUs don’t really get used most of the time in dom0, I find that in my case when launching VMs, they get used almost 100% for a couple of seconds. Naturally, reducing the amount would then make launching VMs slower.

Your qube may reach 100% of CPU during lunching if you allocate too much vCPU’s across entire system, based on “CPU ready”:

It’s same idea for Xen, as higher vCPU:Core rate in your system then you have higher CPU ready.

Bottom line of possible case, during process that counted in milliseconds: your qube start and wait for magic from hypervisor.
At the moment of qube start your system uses all cores and can’t response for tasks from just allocated 4vCPU, you see pick of 100% CPU use. Hypervisor done with his magic, your Qube get same CPU time. CPU use indicates less then 100% use.

Right way to optimize QubesOS, as long as you have laptop/PC with ~12 cores is to reduce vCPU:core rate as much as you can:

  • Each qube that can run with 1vCPU, should stay with 1vCPU.
  • Qube with more than 3vCPU should run as standalone.
  • dom0 is special case.

By this way, each qube get more effective CPU time comparing to wasted CPU ready time.

1 Like

Can’t edit this comment, so I’m posting an update here. This config should be equivalent but also work if someone has SMT enabled (although last I checked, SMT did not work well on asymmetric CPUs):

P_CORES = '0-7,16-19'
E_CORES = '8-15,20-21'