[irrelevant comment retracted]
When reading these discussions it looks like the threat is major, but if everyone running Windows/OSX/Fedora/Debian/Ubuntu/Archlinux/etc are vulnerable (so including servers?) then I’m quite confused about it, because then everyone not running QubesOS should be sweating. I’m not saying it’s not serious and security is not important, I’m just trying to find my personal balance between security and usability, and ways not to go back to Fedora because I can’t play a video smoothly
.
[irrelevant comment retracted]
I don’t know how you feel about me bumping a year old topic but it was either that or starting a new one and I think it makes more sense to continue this one.
It seems like the reason SMT is disabled on qubes os isn’t because it’s a vulnerability but it’s because it’s an unnecessary attack surface where a new 0-day exploit could be found some day in the future.
That brings the question where do we draw the line what is unnecessary attack surface? Because having an internet connection is the greatest attack surface you can have but we allow that threat.
I am no expert at this but my research results say that all the spectre and meltdown vulnerabilites are mitigated or don’t even exist if you have a modern laptop and keep your system up to date. For example debian systems with intel cpu have intel-microcode package to keep all the firmware updated.
And you can test if your system is vulnerable or not with inxi and GitHub - speed47/spectre-meltdown-checker: Reptar, Downfall, Zenbleed, ZombieLoad, RIDL, Fallout, Foreshadow, Spectre, Meltdown vulnerability/mitigation checker for Linux & BSD · GitHub. It will show if you are affected/vulnerable/mitigated to any of the spectre and meltdown variants. Affected means the hardware (cpu when out of factory) is known to be concerned about a spectre and meltdown vulnerability. Vulnerable means you are vulnerable because your system don’t have mitigations against it.
If you run this script on a system that is up to date then it should show you’re not vulnerable with hyper-threading enabled.
But like I said in the beginning, it doesn’t mean there won’t be in the future new 0-day vulnerabilities which could maybe be avoided with SMT disabled.
SMT implementations typically share TLB and L1 caches between threads. This can make cache timing attacks much easier, and one has to assume that this will make several “spectre-like” bugs exploitable. While it’s generally a bad idea to run different security domains on different processor threads on the same core, it’s not trivial to modify a scheduler to take this into account (gang scheduling → schedule different security domains on different physical processors).
Xen implements this partially (Xen Project Schedulers - Xen) … but strict core granularity doesn’t work with newer hybrid chips (e.g. Alder Lake and newer) … yet.
So the question – oversimplified and assuming your main goal is compartmentalisation – from a QubesOS user’s perspective is this: What’s the point of compartmentalisation if the compartments, AKA qubes, “share” TLBs/L1 caches, thus enabling cross-boundary attacks between VMs/qubes, even in unpredictable ways? Well … there is none. You’ll have to choose… “best performance”[1] or “security”[2]?
While SMT/HT often helps a bit with power savings, SMT doesn’t necessarily have a positive effect on performance; it depends on your workload: intensive parallel tasks would benefit, while purely computational tasks often suffer. However, this also needs to be mentioned: The main performance killer for everyday users under QubesOS seems to be the graphics software rendering in qubes. ↩︎
People often try to argue that a little protection is better than no protection at all. A possible reply – formulated casually and with a wink … adapted from here – could be: Where do you live? On the ground floor? Do you have windows there (pun intended)? Some that you can’t open because they’ll fall on your feet? Hey, at least they keep the insects out! How about on the 3rd floor? Do you have an elevator that sometimes crashes or gets stuck? But at least it takes you up sometimes? That’s better than nothing, isn’t it? And you can always take the stairs. You can’t lean against the handrail, because it could break off and cause serious injury, but it’s better than nothing, isn’t it? Silliness aside … The safety requirements discussed here are in many ways the same as for a handrail or banister: It has to provide support. Security is about reliability, whether in a computer or in life. It has to be deterministic about which attacks it will help against and which it won’t. If the railing can withstand a maximum pressure of 250 kilograms, then you can calculate with that. There is no such guarantee with these mitigations alone, because they are – again, oversimplifying – special treatments against specific attacks (“RIDL”, “Fallout”, “Zombieload”, “Store-to-Leak forwarding”, “Meltdown”, etc.), but they don’t solve a more general underlying problem. All these attacks are more or less variants of the same exploit of the speculative execution model of Intel CPUs. Therefore, a reliable general fix must protect VMs/Qubes from cross-boundary attacks in general. So, you’ll need them all: specific mitigations, SMT disabled, firmware/BIOS updates. ↩︎
Consider the following while reading: I do not claim to be an expert in CPUs or CPU vulnerabilities. I write the following merely to add a meaningful contribution, and I welcome any critique or clarification from someone who knows better.
I would like to note a few things:
- All this thread and the conversation it regards talk about is Intel, but AMD is also vulnerable to side-channel security risks (albeit fewer are known).
- With that said, both AMD and Intel have released a series of microcode firmware updates (not to mention OS patches) that mitigate these vulnerabilities: flushing buffers, software hardening, etc.
- These vulnerabilities are almost always discovered by researchers in both academic and security fields–not state actors or hackers–and it sometimes took years of digging to find them. And even though they’re now published, exploiting them isn’t trivial. On top of that, many of the attacks would take multiple successful attempts to garner anything worthwhile–something that isn’t likely with Qubes’ transient nature.
(Important to note, these attacks could be pulled off–in ideal academic circumstances–within minutes; but realistically an attack could take hours or even days or weeks or even months to pull off. And the only way to guarantee these longer attacks succeeded would be to install persistent malware on the VM(s) in question: something all but impossible on Qubes.)
- A large percentage of Qubes users (I won’t quantify this for obvious reasons, but I feel it’s well over 75% (but then again, what do my feelings matter
)) will never have to worry about this. I’d even go as far as to say that the only people that actually do have to worry about this are people whose lives are in real jeopardy: eg, people for whom Tails or Whonix isn’t enough.
The rest are either people who deeply distrust their country, state, or province; paranoid people; people who simply like privacy and security; and/or mere tech enthusiasts. And for these groups, can it be agreed allowing SMT is at least very likely safe?
Thank you for letting me droll on. @marmarek, do you have anything to correct me on here? Again, I’m no CPU or vulnerability expert, and I’m merely going off of what I’ve read.
I ask because I use an AMD CPU with lots of unused threads that I’d love to have access to, and I don’t think the CIA is after me ![]()
Thanks for anyone’s help!
It‘s all about choice, as long as you know, what you are doing. Why do you use Qubes OS?
I belong to both the tech enthusiast and privacy-and-security crowds. Also, I just really love that I can jump between distros rapidly; really beats dual booting.
And sure, it’s all about choice. I agree. I merely wonder if this threat is warranted for the vast majority of Qubes users. I mean, just because one uses Qubes doesn’t mean one must use everything that comes with Qubes. There are a few features in my car that I’ve never used (and never plan to use), but you better believe that if there were a feature to unlock more horsepower, I’d use it
. And to continue the analogy, for the vast majority of drivers, using a bit more horsepower is 100% safe. (Granted, I don’t mean one can safely fly through a school zone; but alas, all analogies fall apart when applied too strictly.)
And also, I suppose my question really goes deeper than I’d initially intended:
Could enabling SMT lead to unexpected bugs with Xen, Qubes, or VMs? (I suppose the answer to that question will always be yes, but I figured I’d ask anyway)
No. But Qubes is about (reasonable) security. And while shared caches and fancy hard-coded branch prediction algorithms give/gave some (reasonable) performance gains, they hurt security in unpredictable ways.
On the analogy side of things: There is no such thing as “partial security” (if you take security considerations as a concept of fact-based risk calculation). See above post → paragraph 2.
And also, I suppose my question really goes deeper than I’d initially intended:
Could enabling SMT lead to unexpected bugs with Xen, Qubes, or VMs? (I suppose the answer to that question will always be yes, but I figured I’d ask anyway)
Yes.
exploiting them isn’t trivial.
You are making a generalization that is flat out wrong, some transient vulnerabilities are trivial to exploit.
I belong to both the tech enthusiast and privacy-and-security crowds. Also, I just really love that I can jump between distros rapidly; really beats dual booting.
Could enabling SMT lead to unexpected bugs with Xen, Qubes, or VMs?
I’m in the same (rough) category and use SMT with little to no fear! “smt=on sched-gran=core” is my optimal balance of security and performance.
You also made a good point about the ephemeral nature of Qubes (depending on the individual use cases, of course), which just makes my use of SMT even more comfortable.
Have you tested the performance difference between running SMT with and without sched-gran=core?
On my system, using sched-gran=core seems to eat most of the gain from SMT. To me, the multithreaded uplift just seems so small it’s not worth it, when using sched-gran=core.
Here are the average multithreaded scores from my 9950X running geekbench in 8 VMs are the same time.
| SMT status | cores | score |
|---|---|---|
| smt=off (baseline) | 2 | 5505 |
| smt=on sched=core | 2 | 3994 |
| smt=on sched=core | 4 | 6912 |
| smt=on | 2 | 5498 |
| smt=on | 4 | 8952 |
On my system, using sched-gran=core seems to eat most of the gain from SMT.
Interesting! I did not test. But I can tell immediately from how the system starts up if smt is off or on… (at least before your /etc/default/grub hack described in “Quick Quality of Life Improvements”, when any new kernel would have clobbered my kernel parameters).
Let me get back to you on this topic.
But like I said in the beginning, it doesn’t mean there won’t be in the future new 0-day vulnerabilities which could maybe be avoided with SMT disabled.
Just wanted to mention, that this has happened around two weeks ago.
https://comsec.ethz.ch/research/microarch/branch-privilege-injection/