CPU Pinning Alder Lake

Do you see 1,3,5,7 as not running/doing nothing, or they completely removed?

I don’t know if the topology is the same for all systems, it might be different for systems that can disable HT in the firmware. I can’t disable HT in the firmware, I can only use the Xen smt=off option, which might be why I see all the cores all the time, for me smt=off just seems to set the affinity to 0,2,4,6…

Xen will balance the load across the affinity set for the domain, you can use xl vcpu-list to see which cores are current being used by which vcpu.

When using smt=off instead of disabling HT in the firmware, all cores/threads are enumerated, but then sibling threads are disabled. Which leaves a bit weird numbering, with holes after sibling threads.

4 Likes

I shifted my main Qubes system from an Intel E5-2699AV4 22-core 2.4 / 3.6Ghz to an i9-14900K 24-core with two max 6Ghz P-cores and I’d say the performance difference is on the order of ~10’s of percent without P/E pinning. I was hoping for quite a bit more and P/E pinning doesn’t look like it will provide much more.

Do make certain to turn off hyper-threading in the BIOS, as that does make the Qubes OS respond better (I expected no difference as the OS turns off hyper-threading, but that was wrong).

Possible improvements:
/usr/share/qubes/templates/libvirt/xen-user.xml

{% extends 'libvirt/xen.xml' %}
{% block basic %}
        {% if vm.features.get('vcpu_pin_pcores', '0') == '1' -%}
        <vcpu>{{ vm.vcpus }}</vcpu>
            <cputune>
                {% for i in range(vm.vcpus) %}
                    <vcpupin vcpu='{{ i }}' cpuset='{{ i }}'/>
                {% endfor %}
            </cputune>
        {% else -%}
        <vcpu>{{ vm.vcpus }}</vcpu>
            <cputune>
                {% for i in range(vm.vcpus) %}
                    <vcpupin vcpu='{{ i }}' cpuset='16-23'/>
                {% endfor %}
            </cputune>
        {% endif -%}
        {{ super() }}
{% endblock %}

Set vcpu_pin_pcores feature to 1 for the qubes that should use P-cores:

qvm-features GamingQube vcpu_pin_pcores 1

Other qubes will use E-cores.

Automatically set the HVM stubdomains to use specific cores using libxl hook:

@renehoj how to adopt this to Intel core ultra 7 155U ? How to detect in xen 4.17 pcores - ecores and lpe ?
If i do somthing like thate
#!/usr/bin/sh
/usr/sbin/xl cpupool-cpu-remove Pool-0 12,13
/usr/sbin/xl cpupool-create name="lpe" sched="credit2"
/usr/sbin/xl cpupool-cpu-add lpe 12,13

/usr/sbin/xl cpupool-migrate sys-net lpe
/usr/sbin/xl cpupool-migrate sys-net-dm lpe
/usr/sbin/xl cpupool-migrate sys-firewall lpe
/usr/sbin/xl cpupool-migrate sys-usb lpe
/usr/sbin/xl cpupool-migrate sys-usb-dm lpe
/usr/sbin/xl cpupool-migrate sys-whonix lpe

xenpm get-cpu-idle-states 0 show C0 to C3

xenpm get-cpu-idle-states 12 and 13 show C0 to C1

If you use xenpm get-cpu-idle-state the P-cores should have a higher clock speed, at least that is how it was with the 12th and 13th gen CPUs.

I don’t know if they changed the layout in the latest gen, but this is how it used to be.
0,3 are your P-cores and they logical siblings, 0 and 2 being the physical cores, 1 and 3 the logical core.
4-11 are your E-cores.

The way you set up the pool, that was how I was setting up my pools on the 13th gen, I don’t see why it shouldn’t work with gen current CPUs.

You will “lose” a C-state when you remove the CPU from Pool-0, but I only think it’s visually. I think it happens because the core no longer is in the same pool as Dom0, and Dom0 no longer has access to the same C-state information.

I could be wrong, but I don’t think the CPU C-states are changed on the Xen sides, it’s only the Dom0 VM that loses access to the information.

1 Like

Here’s the core layout on Intel Ultra 7 155H (with SMT/hyperthreading off)

P_CORES = '0,2,4,6,16,18'
E_CORES = '8,9,10,11,12,13,14,15,20,21'

It’s quite weird but I’m reasonably sure that this layout is correct.

1 Like

I’ll check it out. It’s so strange that it might work.

So, with i9 185H I have 12 “normal” threads, 8 low performance and 2 ultra-low performance. Does it make sense to reserve the last 2 for network and firewall, remaining 8 for stuff like tor, usb, sys-whonix and maybe share them with dom0, and use the rest for “normal” Qubes?

$ xl cpupool-create name=“lpe” sched=“credit2”
command line:2: config parsing error near ‘lpe’: syntax error, unexpected IDENT, expecting STRING or NUMBER or ‘[’
command line:3: config parsing error near ‘credit2’: syntax error, unexpected IDENT, expecting STRING or NUMBER or ‘[’

try xl cpupool-create name=\"lpe\" sched=\"credit2\"

2 Likes

Thanks! Hope playing with pinning would help because out of the box GUI performance is AWFUL. May be related to spontaneous migration to ultra-low-energy cores… or whatever.

What is the core layout for i9 185H btw? the last two (20 and 21) are obviously the slow ones, but the rest? 0-11P, 12-19E? or else? What could be the cause for jerky mouse movement – dom0 or usb qube? Once I excluded 20 and 21 and assigned them to net and fw it seems better…

UPD: noticed the “weird” i7 layout and I assume i9 should be similar… But I never see frequencies going above 4GHz :confused:
UPD2: switched to performance and i see them too much and too noisy :slight_smile:

[ark@dom0 ~]$ xl vcpu-pin Domain-0 0 16-21
libxl: error: libxl_sched.c:62:libxl__set_vcpuaffinity: Setting vcpu affinity: Invalid argument
Could not set affinity for vcpu `0’.

waaat!

If you are trying to change the soft affinity, the then syntax is like this
xl vcpu-pin qube-name all - 16-21

Hm hm. So dom0 can use only CPUs from Pool-0? I see all vCPUs are CPU 0 or 7 now no matter what soft affinity is. I can successfully pin anything anywhere except Domain-0.

Yes, VMs can only use vCPUs from their pool, and you can’t move dom0 out of from Pool-0.

So what is the recommended method to improve performance? Use strict cpu pool assignment or just control everything with affinity settings? dom0 has all the available cores by default for a reason I suppose? What’s the rationale for that?

I don’t think there is any official recommendations.

The Xen documentation says dedicating cores to each VM gives the best performance, but that is not really practical in Qubes OS. It works for dom0, if you have enough cores.

Using the Qubes OS admin API to detect when VMs have started, and using xl to set their affinity worked best for me. I tried both affinity and cpupools, but I found cpupools to be too restrictive, the main issue being that you can’t split the cores in an E and P pool, and also have a VM that can use all cores.

With affinity, you can set the hard affinity to all cores not used by dom0, and the soft affinity to the cores you ideally want the VM to run on. It allows you to mix P and E cores, and Xen will have more freedom to manage the scheduling and deal with the overprovisioning of the CPU.

I added a p-cores feature using qvm-feature, and used that in the script handling the admin evens to decide how many p-cores a VM should be using.

4 Likes

I’ve personally noticed that using affinity instead of 1:1 pinning gives better results in terms of responsiveness, noise and heat. Strict pinning may give better performance but would definitely require more micro-management.

On the 13th gen CPU, I found there is a pretty big different in heat (and noise) when using the credit scheduler, compared to credit2.

https://forum.qubes-os.org/t/which-xen-scheduler-is-best-for-asymetric-intel-cpus-12-13-14th-gen/27335/2

Not sure exactly how credit2 picks what cores to use, but it seemed to heavily favor the high end. Using mostly E cores, and few P cores, seemed to result in some cores getting a lot hotter, which then made the CPU fans run at a higher RPM.

For someone who is focused on PC noise, the difference between credit and credit2 was very noticeable.

2 Likes