Is it safe to use hyper-threading (SMT) with Qubes OS if done the 'correct way'?

OvalZero · February 10, 2025, 6:41pm

SMT implementations typically share TLB and L1 caches between threads. This can make cache timing attacks much easier, and one has to assume that this will make several “spectre-like” bugs exploitable. While it’s generally a bad idea to run different security domains on different processor threads on the same core, it’s not trivial to modify a scheduler to take this into account (gang scheduling → schedule different security domains on different physical processors).

Xen implements this partially (Xen Project Schedulers - Xen) … but strict core granularity doesn’t work with newer hybrid chips (e.g. Alder Lake and newer) … yet.

So the question – oversimplified and assuming your main goal is compartmentalisation – from a QubesOS user’s perspective is this: What’s the point of compartmentalisation if the compartments, AKA qubes, “share” TLBs/L1 caches, thus enabling cross-boundary attacks between VMs/qubes, even in unpredictable ways? Well … there is none. You’ll have to choose… “best performance”^[1] or “security”^[2]?

While SMT/HT often helps a bit with power savings, SMT doesn’t necessarily have a positive effect on performance; it depends on your workload: intensive parallel tasks would benefit, while purely computational tasks often suffer. However, this also needs to be mentioned: The main performance killer for everyday users under QubesOS seems to be the graphics software rendering in qubes. ↩︎
People often try to argue that a little protection is better than no protection at all. A possible reply – formulated casually and with a wink … adapted from here – could be: Where do you live? On the ground floor? Do you have windows there (pun intended)? Some that you can’t open because they’ll fall on your feet? Hey, at least they keep the insects out! How about on the 3rd floor? Do you have an elevator that sometimes crashes or gets stuck? But at least it takes you up sometimes? That’s better than nothing, isn’t it? And you can always take the stairs. You can’t lean against the handrail, because it could break off and cause serious injury, but it’s better than nothing, isn’t it? Silliness aside … The safety requirements discussed here are in many ways the same as for a handrail or banister: It has to provide support. Security is about reliability, whether in a computer or in life. It has to be deterministic about which attacks it will help against and which it won’t. If the railing can withstand a maximum pressure of 250 kilograms, then you can calculate with that. There is no such guarantee with these mitigations alone, because they are – again, oversimplifying – special treatments against specific attacks (“RIDL”, “Fallout”, “Zombieload”, “Store-to-Leak forwarding”, “Meltdown”, etc.), but they don’t solve a more general underlying problem. All these attacks are more or less variants of the same exploit of the speculative execution model of Intel CPUs. Therefore, a reliable general fix must protect VMs/Qubes from cross-boundary attacks in general. So, you’ll need them all: specific mitigations, SMT disabled, firmware/BIOS updates. ↩︎