How does and should Qubes OS handle firmware bugs which cause cpuid and APIC id mismatch?

This mismatch means when, just as a quick example, that CPU 2 has cpuid 0x0002 but APIC 0x0004.
The example was made up but this firmware bug happened to me after upgrading to 4.3 and then updating dom0 after the upgrade. My guess is it was a bios update which is the reason for this. And I don’t know why the bios update wasn’t in 4,2.
I’m a security researcher and have looked into this and here is my research. I could only go so far with the research because it requires more specialized engineers to do the rest. I used the disgusting Meta’s AI model to help with my research.

This error doesn’t seem to cause any problem which breaks the system. It seems it just causes additional delays for the most part. Probably because the system needs to do additional work to assign a core to VM because of the mismatch.

But as a security researcher I wondered if this is a security vulnerability, which could be exploited with a side channel attacks. How you ask? Ok I’ll answer your question.
If a qube is assigned CPU 2, then because of APIC mismatch, it’s interrupted and rescheduled to another qube instead.
This causes both qubes to use the same cpu core and potentially leaking data between them.

But after doing some searching, not googling, I found that this firmware bug is not new. And security researchers haven’t published CVEs or theoretical attack papers about it. That makes me think this firmware bug is not an exploitable attack vector.

Now here is an interesting question that could help a lot of people:
Is it possible to create a generic solution which detects and corrects cpuid and APIC mismatches?
Or do these solutions have to be hardware-specific?

A generic solution seems reasonable necaise it’s just necessary to change the APIC value to be equal to cpuid value. Of course, it’s not simple because you would still need to know which APIC should be used by which cpu. But it seems possible but maybe it’s not possible without knowing the motherboard.

But if it’s possible then it’s better because we only need to solve this problem one time.
Then all users, who end up with this firmware bug at some point, there are many different motherboards out there, and everything changes all the time, 2 years from now, 5 years, and so on. Or else we have to make this solution many times for each motherboard. And it’s a difficult troubleshooting process for most users to go through. And they probably don’t want to broadcast this kind of identifying information on the internet to solve it.

Another possibility is adding a troubleshooting page in the documentation for qubes os, about this firmware bug. And that page should have step by step instructions on how to correct it. That way, a generic solution is possible in a targeted and optional way.

1 Like

Your query will likely be better handled as a GitHub issue.

1 Like

That’s a good point.
It would be helpful if someone, probably one of the contributors, could do that.
Maybe just copy paste the whole text.
Or because I know qubes developers are very detailed and laser focused, you maybe prefer to adjust the text before making a github issue of it.
And then link to this post from the github issue for additional discussion or context or perspectives.

1 Like

I think this would be more appropriate for qubes-devel rather than qubes-issues.
(qubes-issues is not intended as a venue for asking questions.)

2 Likes

I don’t like the suggestion about using mailing list instead because it’s less open. Only those who are opted in to the mailing list will get the information.
The forum is like an extension of qubes wiki. People in the future who have this problem or just want to research, can search the forum and find this discussion.
That includes users who don’t have an account or email address.

1 Like

Anyone can view the mailing list on the web without signing up or logging in.

The mailing list is also searchable and doesn’t require an email address to read (only to post).

But the forum is also perfectly fine.

4 Likes

I think the cause of this, more in-depth than just being a bios update, is that bios updates are known to reset bios settings in an update.
So that means it resets SMT, which can renumber APIC IDs.
Qubes OS automatically re-disables SMT, but doesn’t know how to automatically correct the APIC mismatch.

1 Like

Try to see this from a user or administrator’s perspective.
They get a warning about firmware bug after an update, and that it causes APIC mismatch.
That sounds serious and alarming.
But the user/admin has no idea what that actually means.

The user tries to find information about it in the Qubes OS docs, but there is none.
The user then tries to find information about it in the Xen docs, but there is none.
The user then tries to search the forums for information about it, but there is barely any info that really explains it.

The error is made for specialized system engineers, not for administrators or end-users.
Administrators and users need documentation to understand the implications.
But all we get is “Error”, nothing else.

Personally after doing a lot of research, I feel quite at ease knowing this is very unlikely a security risk.
But it has not been a good experience trying to understand this as an end-user when there’s no documentation for the error message.

1 Like

I’m getting the impression that there are very few people who knows about APIC mismatch, since it has been about a month without any explanation.
And that shows how important it is for this firmware bug (apic mismatch) to be properly documented in Qubes OS documentation.
Because if the OS gives a error message, there should be information about it available in the documentation.

I also am giving an update after more debugging I did.
I suspected at first it was a bios update which I said in my first post.
But I realized later when going through the documentation that can’t be true because firmware updates are not included in Qubes Update Tool. Only microcode updates are included.
And I also don’t think a microcode update caused this because there apparently wasn’t any microcode update in the update I made before this firmware bug happened.

But isn’t that very strange if a FIRMWARE bug has appeared even though I haven’t had any firmware or microcode update?
The previous update had kernel-* and qubes-core-* and python3-* updates/installs.

However the update before the previous update did have a microcode update. But I didn’t experience this firmware bug between that update and the following update.

How did I debug?
I used dnf history list/info
rpm -qa | grap -i microcode
rpm -qi

What does this mean?
That a firmware bug Apic mismatch is probably caused by a QubesOS dom0 update, not by microcode or firmware update.
And there is no information in the documentation about apic mismatch.

1 Like

Wasn’t this a Xen issue that was patched already in 4.20?

1 Like

I don’t know. Maybe it was and now the same bug appears for a new reason?
One the main points I’ve been making is that this is likely going to be a reoccuring bug.
That’s why it’s important to have documentation for the error.

I’m starting to wonder if QubesOS even have any developers who understand the nuances of APIC mismatch.
That would be a big red flag to how secure is QubesOS if the OS is giving updates which cause APIC mismatch, which has no documentation.
I’m quite comfortable though that apic mismatch isn’t a security vulnerability but it’s mostly based on that there are no CVE or theoretical attack papers about it.
But it’s still a security red flag if QubesOS developers have bugs in their OS which they don’t understand.

1 Like

I’m not following how you are drawing this inference. The forum is a great place to work through issues with other Qubes users, but it’s not the best place to get attention from the devs. There are guidelines for escalating issues, which I’ve followed several times and always received the proper level of attention in time. If this is an issue that merits such attention, then please escalate the issue for the benefit of all. The last issue I raised in this way took a full month to resolve, despite being elevated to the highest priority for the devs. If you’re not on github, like me, you can post the issue here and ask others to create a post on github.

1 Like