How does and should Qubes OS handle firmware bugs which cause cpuid and APIC id mismatch?

This mismatch means when, just as a quick example, that CPU 2 has cpuid 0x0002 but APIC 0x0004.
The example was made up but this firmware bug happened to me after upgrading to 4.3 and then updating dom0 after the upgrade. My guess is it was a bios update which is the reason for this. And I don’t know why the bios update wasn’t in 4,2.
I’m a security researcher and have looked into this and here is my research. I could only go so far with the research because it requires more specialized engineers to do the rest. I used the disgusting Meta’s AI model to help with my research.

This error doesn’t seem to cause any problem which breaks the system. It seems it just causes additional delays for the most part. Probably because the system needs to do additional work to assign a core to VM because of the mismatch.

But as a security researcher I wondered if this is a security vulnerability, which could be exploited with a side channel attacks. How you ask? Ok I’ll answer your question.
If a qube is assigned CPU 2, then because of APIC mismatch, it’s interrupted and rescheduled to another qube instead.
This causes both qubes to use the same cpu core and potentially leaking data between them.

But after doing some searching, not googling, I found that this firmware bug is not new. And security researchers haven’t published CVEs or theoretical attack papers about it. That makes me think this firmware bug is not an exploitable attack vector.

Now here is an interesting question that could help a lot of people:
Is it possible to create a generic solution which detects and corrects cpuid and APIC mismatches?
Or do these solutions have to be hardware-specific?

A generic solution seems reasonable necaise it’s just necessary to change the APIC value to be equal to cpuid value. Of course, it’s not simple because you would still need to know which APIC should be used by which cpu. But it seems possible but maybe it’s not possible without knowing the motherboard.

But if it’s possible then it’s better because we only need to solve this problem one time.
Then all users, who end up with this firmware bug at some point, there are many different motherboards out there, and everything changes all the time, 2 years from now, 5 years, and so on. Or else we have to make this solution many times for each motherboard. And it’s a difficult troubleshooting process for most users to go through. And they probably don’t want to broadcast this kind of identifying information on the internet to solve it.

Another possibility is adding a troubleshooting page in the documentation for qubes os, about this firmware bug. And that page should have step by step instructions on how to correct it. That way, a generic solution is possible in a targeted and optional way.

1 Like

Your query will likely be better handled as a GitHub issue.

1 Like

That’s a good point.
It would be helpful if someone, probably one of the contributors, could do that.
Maybe just copy paste the whole text.
Or because I know qubes developers are very detailed and laser focused, you maybe prefer to adjust the text before making a github issue of it.
And then link to this post from the github issue for additional discussion or context or perspectives.

1 Like

I think this would be more appropriate for qubes-devel rather than qubes-issues.
(qubes-issues is not intended as a venue for asking questions.)

2 Likes

I don’t like the suggestion about using mailing list instead because it’s less open. Only those who are opted in to the mailing list will get the information.
The forum is like an extension of qubes wiki. People in the future who have this problem or just want to research, can search the forum and find this discussion.
That includes users who don’t have an account or email address.

1 Like

Anyone can view the mailing list on the web without signing up or logging in.

The mailing list is also searchable and doesn’t require an email address to read (only to post).

But the forum is also perfectly fine.

4 Likes

I think the cause of this, more in-depth than just being a bios update, is that bios updates are known to reset bios settings in an update.
So that means it resets SMT, which can renumber APIC IDs.
Qubes OS automatically re-disables SMT, but doesn’t know how to automatically correct the APIC mismatch.

1 Like

Try to see this from a user or administrator’s perspective.
They get a warning about firmware bug after an update, and that it causes APIC mismatch.
That sounds serious and alarming.
But the user/admin has no idea what that actually means.

The user tries to find information about it in the Qubes OS docs, but there is none.
The user then tries to find information about it in the Xen docs, but there is none.
The user then tries to search the forums for information about it, but there is barely any info that really explains it.

The error is made for specialized system engineers, not for administrators or end-users.
Administrators and users need documentation to understand the implications.
But all we get is “Error”, nothing else.

Personally after doing a lot of research, I feel quite at ease knowing this is very unlikely a security risk.
But it has not been a good experience trying to understand this as an end-user when there’s no documentation for the error message.

1 Like

I’m getting the impression that there are very few people who knows about APIC mismatch, since it has been about a month without any explanation.
And that shows how important it is for this firmware bug (apic mismatch) to be properly documented in Qubes OS documentation.
Because if the OS gives a error message, there should be information about it available in the documentation.

I also am giving an update after more debugging I did.
I suspected at first it was a bios update which I said in my first post.
But I realized later when going through the documentation that can’t be true because firmware updates are not included in Qubes Update Tool. Only microcode updates are included.
And I also don’t think a microcode update caused this because there apparently wasn’t any microcode update in the update I made before this firmware bug happened.

But isn’t that very strange if a FIRMWARE bug has appeared even though I haven’t had any firmware or microcode update?
The previous update had kernel-* and qubes-core-* and python3-* updates/installs.

However the update before the previous update did have a microcode update. But I didn’t experience this firmware bug between that update and the following update.

How did I debug?
I used dnf history list/info
rpm -qa | grap -i microcode
rpm -qi

What does this mean?
That a firmware bug Apic mismatch is probably caused by a QubesOS dom0 update, not by microcode or firmware update.
And there is no information in the documentation about apic mismatch.

1 Like

Wasn’t this a Xen issue that was patched already in 4.20?

1 Like

I don’t know. Maybe it was and now the same bug appears for a new reason?
One the main points I’ve been making is that this is likely going to be a reoccuring bug.
That’s why it’s important to have documentation for the error.

I’m starting to wonder if QubesOS even have any developers who understand the nuances of APIC mismatch.
That would be a big red flag to how secure is QubesOS if the OS is giving updates which cause APIC mismatch, which has no documentation.
I’m quite comfortable though that apic mismatch isn’t a security vulnerability but it’s mostly based on that there are no CVE or theoretical attack papers about it.
But it’s still a security red flag if QubesOS developers have bugs in their OS which they don’t understand.

1 Like

I’m not following how you are drawing this inference. The forum is a great place to work through issues with other Qubes users, but it’s not the best place to get attention from the devs. There are guidelines for escalating issues, which I’ve followed several times and always received the proper level of attention in time. If this is an issue that merits such attention, then please escalate the issue for the benefit of all. The last issue I raised in this way took a full month to resolve, despite being elevated to the highest priority for the devs. If you’re not on github, like me, you can post the issue here and ask others to create a post on github.

4 Likes

Qubes OS developers have already seen this topic. ADW even posted twice.
They have had about one and a half month already.

I thought I already made it clear that this is very important. But let me try again to explain.

Qubes OS dom0 updates are causing a FIRMWARE BUG.
There is no solution for it.
There is no documentation for it.
Just an error message.

This is very bad development practices to ignore a serious bug and not having documentation for it.

And it really questions the security of Qubes OS if they are failing with fundamental software development practices.
Or is it because they actually don’t know?
They don’t have enough experience about how APIC mismatch affects Qubes OS, and that’s why there is no documentation?
And do they also not know how such a firmware bug could be caused by dom0 updates?
That is also a red flag if the developers are up to the task of writing a secure OS.

1 Like

Are you not getting the warning because you don’t disable SMT in the firmware, which you are told to do?

If you don’t force Xen to enumerate the logic cores, there is no mismatch.

You also don’t understand the issue, so how could you tell if this is a serious bug?

5 Likes

Yes. I do not think the boundaries between BIOS, loadable device firmware, and kernel are always clearly expressed or comprehended.

In this case, it seems possible that a kernel update has revealed an old BIOS issue, but OP has assumed that Qubes has updated the BIOS and introduced a new bug:

I stand to be corrected.

1 Like

…but not by any kind of AI output :slight_smile:

1 Like

I appreciate you are trying to figure this out but where are users told to disable SMT in the BIOS? I find no such instruction in the docs.

The only information about SMT/hyperthreading is in Suspend/resume troubleshooting — Qubes OS Documentation
It says there that QubesOS disables hyperthreading by default.

I can’t but the point I’m trying to make is, it could be a serious bug, and the developers might not know either.
And there should be documentation if there is an error message.

So either the problem here is that the devs don’t know how APIC mismatch affects QubesOS, and also doesn’t know why it would happen.
Or the devs do know, but they for unknown reason think end-users don’t need documentation about errors they receive. But that would be strange considering this is a security focused OS.

My guess in the first post that the Qubes Update Tool did a firmware update was wrong.
It was based on information from the Update Tool is doing. It’s usually mentioning firmware packages. I don’t know why it does that when it shouldn’t be doing any firmware updates. It also seems most reasonable that a firmware bug is caused by a firmware update.
It can do microcode updates but later in this topic I did some debugging which shows there was no microcode update in the update which introduced this firmware bug.

But you are on to something, it is a possibility/theory that the dom0 update reveals a previous firmware issue.
The strange thing is that there was no firmware issue before. It’s not only that there is now an error message, it’s also that the system is noticable in a worse condition.
If it was an old firmware issue like you are guessing, then shouldn’t the issues I’m experiencing now, have been noticeable before as well?

1 Like

It is not clear - do you actually have any issues, apart from the warning?

I am not 100% sure about the following, but…

In most cases, I think, xen and the kernel work around these types of bugs. Sometimes the (better) manufacturers release UEFI/BIOS updates, but often not, if Windows runs OK.

The warnings should be harmless. Removing them should be purely cosmetic.

You could check your MB manufacturer website, try looking again for “hyperthread” or “SMT” in the config, or just be happy that the correct APIC is almost certainly being used for each CPU, or there would be - I think- all sorts of hardware lockups.

1 Like

I said in the first post that it seems to mostly be causing additional delays.
This includes “starting the computer” and getting to the disc decryption. This takes much additional time.
And then there are longer delays starting a new qube.
And another thing I’ve noticed is when selecting a network in the dom0 system tray drop down menu, it used to be almost immediate for the password window to show. But now it takes a while before it appears.
All these problems didn’t exist before the apic mismatch firbware bug was introduced.
And I am also not experienced enough to to say with certainty this is all that the bug is causing.
That’s why there should be official documenation on this apic mismatch firmware bug.
If the OS produces an error message, there really should be documenation for it.
This isn’t a general Linux problem which I can go to for example Ubuntu documenation for.
This is specific to Qubes OS.

I have also read the grub config (etc/default/grub) and verified that smt=off.

1 Like

Ah, I forgot the slowness. There is much discussion at the “sluggish” thread.

This seems like a reasonable idea.

Did you look in your BIOS, to see if you can find how to disable Hyperthreading/HT/HTT/SMT ? I think Intel use “Hyperthreading technology” term. It would be useful to know if it changes the delays, and could be helpful in such a document.

1 Like