Symmetric Multi-threading BIOS setting

While troubleshooting a problem with wakeup from sleep on my Lenovo Thinkpad T470s, I realized that SMT was enabled in the BIOS, and even though it’s disabled at runtime by Xen/Qubes, for some reason the BIOS setting broke sleep/wakeup (the computer would go to sleep but need a hard reboot to wake up). Disabling SMT in the BIOS/UEFI fixed the issue, as mentioned in the documentation.

Now, the interesting part is that disabling SMT in the BIOS seems to have caused general performance improvement and, possibly, faster boot times (call me crazy, this could just be my perception and some snake oil). Is this expected? If so, disabling SMT in the BIOS should be indicated more prominently in the documentation (perhaps at the beginning of the installation section?).

If I’m crazy and my perception is incorrect, please forgive my false report. Perhaps I’m so happy to see sleep work properly that I’m starting to see things. I don’t feel like going back to enabling SMT to perform some serious benchmarking, though :slight_smile:

Flavio

Hi Flavio,
you can measure it, so it won’t be a perception but a reference value :

[user@dom0 ~]$ systemd-analyze 
Startup finished in 7.054s (firmware) + 7.612s (loader) + 4.204s (kernel) + 27.582s (initrd) + 18.883s (userspace) = 1min 5.338s 
graphical.target reached after 18.869s in userspace

[user@dom0 ~]$ systemd-analyze blame
26.529s dracut-initqueue.service
25.586s systemd-cryptsetup@luks...
13.969s qubes-vm@sys-firewall.service
 7.801s qubes-vm@sys-net.service
...

Then switch back to SMT and compare.

1 Like

@ludovic Thanks for the quick response and for pushing me towards more scientific validation. It seems that I was right, see for yourself:

SMT disabled in the BIOS

Startup finished in 15.718s (firmware) + 3.991s (loader) + 10.076s (kernel) + 9.945s (initrd) + 50.683s (userspace) = 1min 30.415s
graphical.target reached after 50.636s in userspace

SMT enabled in the BIOS

Startup finished in 16.048s (firmware) + 4.161s (loader) + 10.179s (kernel) + 10.404s (initrd) + 1min 43.542s (userspace) = 2min 24.337s
graphical.target reached after 1min 43.488s in userspace

Times seem to be longer across the board. All services take longer to start, according to blame.

Should “disable SMT in your BIOS/UEFI” be more prominently displayed on the documentation then? Not only did it fix my wakeup from sleep problem, it made the system’s performance significantly better too.

Does anyone know if there is an explanation on why disabling SMT in the BIOS/UEFI would make such a dramatic difference in overall performance?

I actually have the opposite experience:

@tzwcfq That is interesting! I see a roughly 40% speedup if I turn SMT off in the BIOS/UEFI, as shown by the boot times (and VM start times too). Perhaps the difference between our experience has to do with the processor generation and/or the overall hardware. As I said above, this is on a Lenovo Thinkpad t470s, with an Intel Core i5 6300u with 12GB of DDR4 RAM in flex configuration (4GB soldered in and 8 GB on a second slot), so it’s partially dual channel. You, on the other hand, are on a high end latest generation CPU, so perhaps it’s an issue with how well your CPU is supported by Xen and Linux in general.

Yes, this is the most likely reason.
For reference I’m using kernel 5.17.5 and my startup time is:

BIOS smt=on

$ systemd-analyze
Startup finished in 14.485s (firmware) + 2.211s (loader) + 1.927s (kernel) + 5.711s (initrd) + 37.912s (userspace) = 1min 2.248s
graphical.target reached after 37.891s in userspace

BIOS smt=off

Startup finished in 13.934s (firmware) + 1.585s (loader) + 1.926s (kernel) + 8.859s (initrd) + 51.023s (userspace) = 1min 17.329s
graphical.target reached after 50.987s in userspace

@tzwcfq Kernel 5.10.112-1.fc32.qubes.x86_64 SMP here.

My kernel is older and the officially provided with Qubes 4.1. Now, interestingly enough, even my Windows 10 qube runs significantly faster with SMT turned off in the BIOS, so I don’t think the Linux kernel version really changes much. You may want to look into the Xen version instead.

Of course, it could be the Linux kernel in Dom0 that somehow affects the overall performance of I/O, and surely the linux kernel in your Linux VMs may have something to do with it too when it comes to those VMs, but you should rather look at Xen itself first.

This may be related to the lack of Intel Hardware P-states (HWP) support in Xen:
CPU Frequency Scaling Broken · Issue #4604 · QubesOS/qubes-issues · GitHub
Since your CPU doesn’t have HWP support so maybe it’s not affecting you.
I’ve tried to use dom0 based cpufreq instead of xen based cpufreq:
Xen power management - Xen
But I got this warning:
intel_pstate: CPU model not supported
Then I’ve checked CPU flags and there were no HWP flags:
lscpu | grep Flags | tr ' ' '\n' | grep hwp
It seems that Xen hides CPUID leaf 0x06 from dom0. Related patch:
xenbits.xen.org Git - xen.git/commitdiff
And this leaf is providing information on HWP support:
is intel_pstate working with or without HWP? - Intel Communities
And since kernel in dom0 can’t see that HWP is supported by CPU it’s not loading intel_pstate module.
I’ve tried to add intel_pstate=hwp_broken_firmware kernel command line option but it didn’t help.
Maybe someday I’ll try the patch from linked Qubes issue.

1 Like

What results do you get with the smt=on kernel option?

I can’t change the BIOS setting, but enabling the kernel option also makes my system boot slower.

54.6s with kernel smt=off
1m15s with kernel smt=on

I’ve tried to test boot time in more detail and it doesn’t seems that reliable.
First thing is that boot time is unstable. I think it may be related to the CPU E-cores usage.
And second is that boot time somehow depends on how much CPU cores are assigned to dom0 with dom0_max_vcpus xen cmdline.
It seems that the best boot time is when dom0 has 2 vcpu and boot time is around the same for all 3 BIOS/Xen SMT settings combinations.
But if all vcpu are given to dom0 then with BIOS smt=on xen smt=off boot time is noticeably faster.

Here are my results for multiple runs:

BIOS smt=on xen smt=on

Summary

dom0 has all 24 vcpu (no dom0_max_vcpus specified)

Startup finished in 14.478s (firmware) + 1.148s (loader) + 1.976s (kernel) + 6.949s (initrd) + 1min 1.668s (userspace) = 1min 26.221s
graphical.target reached after 1min 1.655s in userspace
Startup finished in 14.471s (firmware) + 1.280s (loader) + 1.912s (kernel) + 8.297s (initrd) + 57.730s (userspace) = 1min 23.693s
graphical.target reached after 57.703s in userspace
Startup finished in 14.477s (firmware) + 1.561s (loader) + 2.015s (kernel) + 7.611s (initrd) + 53.100s (userspace) = 1min 18.765s
graphical.target reached after 53.076s in userspace
Startup finished in 14.473s (firmware) + 1.180s (loader) + 1.986s (kernel) + 6.757s (initrd) + 57.199s (userspace) = 1min 21.598s
graphical.target reached after 57.167s in userspace

dom0 has 16 vcpu (dom0_max_vcpus=16)

Startup finished in 14.479s (firmware) + 1.166s (loader) + 1.850s (kernel) + 6.442s (initrd) + 54.048s (userspace) = 1min 17.987s
graphical.target reached after 54.007s in userspace
Startup finished in 14.473s (firmware) + 1.114s (loader) + 1.843s (kernel) + 6.923s (initrd) + 59.794s (userspace) = 1min 24.148s
graphical.target reached after 59.755s in userspace

dom0 has 8 vcpu (dom0_max_vcpus=8)

Startup finished in 14.468s (firmware) + 1.179s (loader) + 1.936s (kernel) + 6.423s (initrd) + 45.021s (userspace) = 1min 9.029s
graphical.target reached after 45.005s in userspace

dom0 has 4 vcpu (dom0_max_vcpus=4)

Startup finished in 14.477s (firmware) + 2.358s (loader) + 1.863s (kernel) + 5.855s (initrd) + 45.670s (userspace) = 1min 10.225s
graphical.target reached after 45.649s in userspace

dom0 has 2 vcpu (dom0_max_vcpus=2)

Startup finished in 14.480s (firmware) + 1.066s (loader) + 1.927s (kernel) + 5.962s (initrd) + 35.097s (userspace) = 58.535s
graphical.target reached after 35.075s in userspace
Startup finished in 14.475s (firmware) + 1.133s (loader) + 1.820s (kernel) + 5.950s (initrd) + 34.694s (userspace) = 58.074s
graphical.target reached after 34.684s in userspace

dom0 has 1 vcpu (dom0_max_vcpus=1)

Startup finished in 14.467s (firmware) + 1.182s (loader) + 1.900s (kernel) + 7.829s (initrd) + 44.180s (userspace) = 1min 9.560s
graphical.target reached after 44.154s in userspace
Startup finished in 14.471s (firmware) + 1.251s (loader) + 1.952s (kernel) + 7.360s (initrd) + 44.826s (userspace) = 1min 9.862s
graphical.target reached after 44.799s in userspace

BIOS smt=on xen smt=off

Summary

dom0 has all 16 vcpu (no dom0_max_vcpus specified)

Startup finished in 14.465s (firmware) + 1.229s (loader) + 1.970s (kernel) + 5.721s (initrd) + 42.034s (userspace) = 1min 5.421s
graphical.target reached after 42.022s in userspace
Startup finished in 14.473s (firmware) + 1.182s (loader) + 1.836s (kernel) + 6.538s (initrd) + 46.821s (userspace) = 1min 10.853s
graphical.target reached after 46.810s in userspace
Startup finished in 14.476s (firmware) + 2.441s (loader) + 2.053s (kernel) + 9.937s (initrd) + 38.799s (userspace) = 1min 7.709s
graphical.target reached after 38.789s in userspace
Startup finished in 14.481s (firmware) + 1.196s (loader) + 1.953s (kernel) + 6.067s (initrd) + 43.598s (userspace) = 1min 7.297s
graphical.target reached after 43.590s in userspace

dom0 has 8 vcpu (dom0_max_vcpus=8)

Startup finished in 14.476s (firmware) + 1.197s (loader) + 1.820s (kernel) + 5.173s (initrd) + 35.903s (userspace) = 58.571s
graphical.target reached after 35.895s in userspace

dom0 has 4 vcpu (dom0_max_vcpus=4)

Startup finished in 14.469s (firmware) + 1.311s (loader) + 1.819s (kernel) + 5.198s (initrd) + 34.018s (userspace) = 56.818s
graphical.target reached after 34.010s in userspace

dom0 has 2 vcpu (dom0_max_vcpus=2)

Startup finished in 14.464s (firmware) + 1.247s (loader) + 1.925s (kernel) + 5.808s (initrd) + 32.177s (userspace) = 55.623s
graphical.target reached after 32.160s in userspace
Startup finished in 14.481s (firmware) + 1.180s (loader) + 1.910s (kernel) + 5.587s (initrd) + 34.100s (userspace) = 57.260s
graphical.target reached after 34.087s in userspace

dom0 has 1 vcpu (dom0_max_vcpus=1)

Startup finished in 14.476s (firmware) + 1.743s (loader) + 1.943s (kernel) + 7.411s (initrd) + 43.706s (userspace) = 1min 9.281s
graphical.target reached after 43.678s in userspace

BIOS smt=off xen smt=off

Summary

dom0 has all 16 vcpu (no dom0_max_vcpus specified)

Startup finished in 13.935s (firmware) + 1.834s (loader) + 1.854s (kernel) + 6.726s (initrd) + 1min 747ms (userspace) = 1min 25.096s
graphical.target reached after 1min 713ms in userspace
Startup finished in 14.459s (firmware) + 1.150s (loader) + 1.855s (kernel) + 6.844s (initrd) + 51.702s (userspace) = 1min 16.013s
graphical.target reached after 51.679s in userspace
Startup finished in 14.463s (firmware) + 1.315s (loader) + 1.852s (kernel) + 7.009s (initrd) + 50.996s (userspace) = 1min 15.637s
graphical.target reached after 50.981s in userspace
Startup finished in 14.446s (firmware) + 2.775s (loader) + 1.953s (kernel) + 11.078s (initrd) + 51.386s (userspace) = 1min 21.641s
graphical.target reached after 51.364s in userspace
Startup finished in 14.455s (firmware) + 1.613s (loader) + 1.853s (kernel) + 6.743s (initrd) + 52.357s (userspace) = 1min 17.024s
graphical.target reached after 52.326s in userspace
Startup finished in 14.459s (firmware) + 1.348s (loader) + 1.844s (kernel) + 6.985s (initrd) + 49.663s (userspace) = 1min 14.301s
graphical.target reached after 49.625s in userspace
Startup finished in 13.929s (firmware) + 1.253s (loader) + 1.921s (kernel) + 7.170s (initrd) + 58.594s (userspace) = 1min 22.869s
graphical.target reached after 58.560s in userspace
Startup finished in 14.465s (firmware) + 1.017s (loader) + 1.959s (kernel) + 7.183s (initrd) + 58.360s (userspace) = 1min 22.987s
graphical.target reached after 58.323s in userspace

dom0 has 8 vcpu (dom0_max_vcpus=8)

Startup finished in 14.446s (firmware) + 1.279s (loader) + 1.821s (kernel) + 6.579s (initrd) + 46.048s (userspace) = 1min 10.174s
graphical.target reached after 46.032s in userspace

dom0 has 2 vcpu (dom0_max_vcpus=2)

Startup finished in 14.427s (firmware) + 1.279s (loader) + 1.917s (kernel) + 6.493s (initrd) + 38.787s (userspace) = 1min 2.907s
graphical.target reached after 38.774s in userspace
Startup finished in 14.446s (firmware) + 1.152s (loader) + 2.057s (kernel) + 6.038s (initrd) + 34.954s (userspace) = 58.649s
graphical.target reached after 34.936s in userspace

dom0 has 1 vcpu (dom0_max_vcpus=1)

Startup finished in 14.467s (firmware) + 1.150s (loader) + 1.848s (kernel) + 7.492s (initrd) + 45.275s (userspace) = 1min 10.234s
graphical.target reached after 45.252s in userspace

1 Like

Interesting that two cores is faster.

That post also seem to suggest that more isn’t always better.

Hyper-threading can deliver a performance improvement through keeping CPU units evenly busy, especially in long pipelines, but in my experience the gain was never higher than 15 or 20% anyway (despite CPU manufacturers’ claims). However, it does seem to bring some potentially serious drawbacks (for example, cache leaking, etc.). Now, looking at the big picture, you could say the same about superscalar CISC CPUs and long pipelines (for example, issues with leaking instructions and data due to speculative execution and branch prediction, etc.). In sum, there is no free lunch and maybe we should just return to simpler RISC CPU designs, but that doesn’t scale either because you can’t just increase frequency, reduce size and increase power forever, can you?

I wanted to check if this affected my boot time, but there seems no reliable way to test as the boot time differs all the time. Did 3 tests without changing any settings and the boot times vary greatly.

Maybe you’ve spent different time on entering password as it affect total boot time.

Ah yes that could be it. Is it including disk encryption password or user login password or both?

I repeated the boot benchmark a few times and I have that 40% speedup fairly consistently with SMT disabled in BIOS. Most of the time savings seem to be around starting VMs, as you can see from the numbers below. Again, this is on a Thinkpad t470s with a 6300u Skylake processor, with all of the latest patches and Qubes OS 4.1, in case that matters.

With SMT enabled in the BIOS:

[flavio@dom0 ~]$ systemd-analyze
Startup finished in 16.048s (firmware) + 4.161s (loader) + 10.179s (kernel) + 10.404s (initrd) + 1min 43.542s (userspace) = 2min 24.337s
graphical.target reached after 1min 43.488s in userspace

[flavio@dom0 ~]$ systemd-analyze blame
1min 36.019s qubes-vm@sys-whonix.service >
52.232s qubes-vm@sys-firewall.service >
29.600s qubes-vm@sys-net.service >
26.122s qubes-vm@sys-usb.service >
7.599s dracut-initqueue.service >
6.099s systemd-cryptsetup@luks\x2d90d3ac5e\x2d2114\x2d43b7\x2d9d35\x2d0f5>
3.836s systemd-udev-settle.service >
2.802s lvm2-pvscan@253:0.service >
2.513s lvm2-monitor.service >
1.348s qubes-qmemman.service >
1.227s plymouth-quit-wait.service >
677ms upower.service >
658ms qubesd.service >
611ms dracut-cmdline.service >
542ms initrd-switch-root.service >
406ms systemd-vconsole-setup.service >
377ms systemd-logind.service >
298ms systemd-udev-trigger.service >
290ms initrd-parse-etc.service >
265ms user@1000.service >
261ms qubes-core.service >
246ms xenstored.service >
237ms libvirtd.service >
235ms systemd-journal-flush.service >
193ms systemd-homed.service >
189ms systemd-udevd.service >
147ms accounts-daemon.service >
145ms systemd-journald.service >
126ms xen-init-dom0.service >
112ms systemd-fsck@dev-disk-by\x2duuid-1A97\x2dECC5.service >
111ms dev-mapper-qubes_dom0\x2dswap.swap >
109ms dev-mqueue.mount >
107ms proc-xen.mount >
104ms sys-kernel-debug.mount >
104ms polkit.service >
97ms sys-kernel-tracing.mount >
94ms kmod-static-nodes.service >
90ms sys-kernel-config.mount >
89ms lightdm.service >
89ms modprobe@fuse.service >
84ms plymouth-start.service >
77ms systemd-remount-fs.service >
73ms dracut-pre-udev.service >
73ms systemd-modules-load.service >
72ms systemd-repart.service >
67ms systemd-tmpfiles-setup.service >
64ms systemd-tmpfiles-setup-dev.service >
64ms systemd-fsck@dev-disk-by\x2duuid-b4cecd7b\x2db58a\x2d4778\x2d8abe>
57ms systemd-sysctl.service >
55ms dmraid-activation.service >
53ms systemd-random-seed.service >
50ms qubes-db-dom0.service >
46ms systemd-userdbd.service >
46ms xenconsoled.service >
37ms systemd-update-utmp-runlevel.service >
37ms dbus-broker.service >
36ms initrd-cleanup.service >
36ms plymouth-read-write.service >
35ms systemd-fsck-root.service >
34ms plymouth-switch-root.service >
33ms user-runtime-dir@1000.service >
32ms systemd-update-utmp.service >
30ms boot-efi.mount >
27ms systemd-backlight@backlight:intel_backlight.service >
26ms rtkit-daemon.service >
25ms dracut-shutdown.service >
25ms systemd-user-sessions.service >
23ms systemd-rfkill.service >
22ms boot.mount >
22ms sys-fs-fuse-connections.mount >
20ms tmp.mount >
19ms initrd-udevadm-cleanup-db.service >
19ms systemd-backlight@leds:tpacpi::kbd_backlight.service >
18ms sysroot.mount >
15ms var-lib-xenstored.mount >

With SMT disabled in the BIOS:

[flavio@dom0 ~]$ systemd-analyze
Startup finished in 15.718s (firmware) + 3.991s (loader) + 10.076s (kernel) + 9.945s (initrd) + 50.683s (userspace) = 1min 30.415s
graphical.target reached after 50.636s in userspace

[flavio@dom0 ~]$ systemd-analyze blame
43.176s qubes-vm@sys-whonix.service >
32.495s qubes-vm@sys-firewall.service >
27.216s qubes-vm@sys-usb.service >
18.018s qubes-vm@sys-net.service >
7.179s dracut-initqueue.service >
5.793s systemd-cryptsetup@luks\x2d90d3ac5e\x2d2114\x2d43b7\x2d9d35\x2d0f540bdf>
3.780s systemd-udev-settle.service >
2.752s lvm2-pvscan@253:0.service >
2.377s lvm2-monitor.service >
1.381s plymouth-quit-wait.service >
1.314s qubes-qmemman.service >
678ms upower.service >
649ms qubesd.service >
637ms dracut-cmdline.service >
538ms initrd-switch-root.service >
431ms systemd-vconsole-setup.service >
388ms systemd-logind.service >
305ms accounts-daemon.service >
296ms systemd-udev-trigger.service >
294ms initrd-parse-etc.service >
262ms user@1000.service >
262ms xenstored.service >
260ms qubes-core.service >
229ms systemd-homed.service >
216ms polkit.service >
191ms libvirtd.service >
185ms systemd-journal-flush.service >
175ms systemd-udevd.service >
156ms systemd-journald.service >
114ms lightdm.service >
105ms dev-mqueue.mount >
103ms proc-xen.mount >
103ms dev-mapper-qubes_dom0\x2dswap.swap >
101ms sys-kernel-debug.mount >
99ms sys-kernel-tracing.mount >
97ms systemd-tmpfiles-setup-dev.service >
91ms kmod-static-nodes.service >
89ms xen-init-dom0.service >
84ms plymouth-start.service >
83ms systemd-fsck@dev-disk-by\x2duuid-1A97\x2dECC5.service >
79ms sys-kernel-config.mount >
77ms modprobe@fuse.service >
75ms systemd-random-seed.service >
72ms dracut-pre-udev.service >
72ms systemd-tmpfiles-setup.service >
63ms dmraid-activation.service >
62ms systemd-fsck@dev-disk-by\x2duuid-b4cecd7b\x2db58a\x2d4778\x2d8abe\x2df0>
61ms systemd-modules-load.service >
60ms systemd-tmpfiles-clean.service >
58ms systemd-repart.service >
58ms tmp.mount >
57ms systemd-sysctl.service >
55ms qubes-db-dom0.service >
54ms xenconsoled.service >
49ms systemd-userdbd.service >
43ms boot-efi.mount >
42ms systemd-remount-fs.service >
40ms systemd-rfkill.service >
37ms systemd-fsck-root.service >
37ms systemd-update-utmp-runlevel.service >
36ms systemd-backlight@backlight:intel_backlight.service >
34ms initrd-cleanup.service >
32ms plymouth-switch-root.service >
30ms dbus-broker.service >
28ms systemd-update-utmp.service >
27ms systemd-backlight@leds:tpacpi::kbd_backlight.service >
25ms var-lib-xenstored.mount >
25ms systemd-user-sessions.service >
22ms sys-fs-fuse-connections.mount >
21ms initrd-udevadm-cleanup-db.service >
21ms dracut-shutdown.service >
21ms user-runtime-dir@1000.service >
20ms plymouth-read-write.service >
12ms rtkit-daemon.service >
12ms boot.mount >
11ms sysroot.mount >

Only time during disk encryption. It’ll increase initrd time in systemd-analyze.