I have an HP Z840 workstation with dual Xeon E5-2699A v4 CPUs. This configuration has 22 cores and 44 threads per socket, for a total of 44 cores / 88 threads.
However, in Qubes, in the dmesg output it prints:
user@dom0 ~]$ sudo dmesg | grep -E -i '(numa|smp)'
[ 0.000000] Linux version 6.6.48-1.qubes.fc37.x86_64 (mockbuild@065a31b3c1ba4c34ab3938416488814f) (gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1), GNU ld version 2.38-27.fc37) #1 SMP PREEMPT_DYNAMIC Wed Sep 4 01:09:59 GMT 2024
[ 0.211041] NUMA turned off
[ 0.957372] ACPI: Using ACPI (MADT) for SMP configuration information
[ 0.963574] smpboot: Allowing 44 CPUs, 0 hotplug CPUs
[ 1.834257] Freeing SMP alternatives memory: 48K
[ 1.837944] smp: Bringing up secondary CPUs ...
[ 1.897703] smp: Brought up 1 node, 44 CPUs
[ 1.897708] smpboot: Max logical packages: 1
Later on it prints:
[ 2.210349] APIC: NR_CPUS/possible_cpus limit of 44 reached. Processor 44/0x1 ignored.
[ 2.210352] ACPI: Unable to map lapic to logical cpu number
[ 2.210583] APIC: NR_CPUS/possible_cpus limit of 44 reached. Processor 45/0x3 ignored.
[ 2.210585] ACPI: Unable to map lapic to logical cpu number
with the series proceeding through processor 87/0x79.
NUMA is definitely enabled in the BIOS, as is hyperthreading.
Is this normal with Qubes? I only really look at the dmesg output when I have a problem. I have been having big problems with this workstation recently. I’ve been running this rig for almost a year with no issues, but in the last couple of weeks it has started to completely freeze up periodically, with each freeze ranging from a few seconds to what feels like an eternity but is probably still less than a minute. I suspected my GPU at first because at the time of the freezes I saw a lot of messages like Fence fallback timer expired in ring gfx
and Fence fallback timer expired on ring sdma0
. I replaced the video card with no improvement. What did eventually solve my issue is backing up my Qubes, secure erasing the NVMe drive, installing Qubes from scratch and restoring my Qubes. I completed that process this morning and so far so good (knock wood).
The reason I’m bringing up this problem is that it’s the reason I was looking at dmesg in the first place. I could not testify in court that I was seeing all of my CPUs even before I started having these problems.
I opened a text console while the Qubes installer was running and looked at the dmesg there. Interestingly, it also printed NUMA turned off
, although at that time it still listed all 88 “cores” (44 cores, 88 threads). After booting into Qubes, it was back to limiting me to 44 “cores” and one NUMA node.
Is there something intrinsic to Qubes / Xen that is doing this? I don’t think so because I had a Z820 workstation before this one that also supported NUMA. I ran that machine for years and if I wasn’t getting all of my cores I am pretty sure I would have noticed.