Qubes Entropy Verification

I am in the process of verifying that everything entropy-related is up to par in Qubes, specifically within Debian-based templates.

My understanding is that most entropy within a Qubes VM comes from the “Linus Jitter Dance”, haveged, jitterentropy-rngd and a one-off read from dom0’s getrandom(0) of 512 bytes. This seems to be fine, but I have some questions. I will refer to /dev/[u]random and getrandom as the CSRNG.

  1. Because we are only receiving entropy from dom0 once, CPU jitter will be the only source topping up the entropy pool after first boot. Therefore the CSRNG will be continually reseeded from only a single source of entropy. I think I’ve read that the main reason for reseeding is to recover from compromise, and that the initial seed alone is sufficient for all the randomness the system could need when there is no compromise. Is my thinking true, and as such do I not need to worry that we are only reseeding with CPU jitter?

  2. I am very confident that it is impossible for entropy to “exit” the system during a reseed, but I would like some confirmation of this. By this I mean, is it possible that new entropy will replace old entropy as the seed for the CSRNG, or is it always used in combination? It’s important to confirm this, because it would be bad if dom0’s bytes were replaced after some time by purely CPU jitter entropy.

  3. I think it’s possible that after the system has booted there is a brief period of time (< 1 minute) where the CSRNG could be seeded with CPU jitter only (not dom0’s bytes). This is because systemd-random-seed service runs relatively late so the kernel may have already acquired enough entropy to seed the CSRNG through its own jitter mechanism. If this happens, dom0 bytes will not be included in the CSRNG seed for at most 1 minute. Does anyone else agree this could be an issue? It won’t affect many users, but I know I’ve started a VM and immediately generated a key before.

  4. Where can I find the code responsible for doing LUKS encryption during installation? I don’t like that cryptsetup uses /dev/urandom by default, so I want to verify that Qubes is using /dev/random as the LUKS entropy source, or that /dev/urandom is guaranteed to be initialized at the time cryptsetup is used.

Thanks.

4 Likes

@adrelanos @v6ak @Demi @brendanhoar @tripleh

Sorry if these tags are unwelcome, but I can’t post in the relevant issue (Improve entropy collection in VMs · Issue #673 · QubesOS/qubes-issues · GitHub) as GitHub captchas are not working with Tor Browser.

1 Like

I am not sure you will be able to do any sort of useful audit of entropy (in Qubes) without making this your life mission, being the one who answers instead of asks questions and/or pay somebody (not me) to perform an audit. You would have to become an or engage someone such as Stephan Mueller or DJB.

Is topping up of critical importance?

Quote Re: [cryptography] urandom vs random

whoever wrote the /dev/random manual page seems to simultaneously believe that

(1) we can’t figure out how to deterministically expand one 256-bit /dev/random output into an endless stream of unpredictable keys

In other words, in my interpretation, djb thinks that yes, an endless stream of unpredictable keys can be derived from one random seed.

Re-seeding is useful to recover from compromise where the random seed was leaked but it’s hard to imagine a random seed compromise without the whole system (VM in case of Qubes) being compromised.

https://phabricator.whonix.org/T727

This question alone is difficult enough. You would need to look that up in the kernel source code. Short of that, ask in general Linux discussion places such as security.stackexchange.com and/or the kernel mailing list.

As far as my interpretation of the man pages goes, it is always mixed in, not replaced. Compromised + uncompromised entropy always becoming uncompromised entropy.


I guess it would be better if Xen (or at least Qubes) had something similar to QEMU / KVM’s Features/VirtIORNG - QEMU. I might have posted a feature request on the Xen mailing list some day. Not sure. Cannot find the reference. But if I did I am pretty sure I got ignored. Also mentioned in Improve entropy collection in VMs · Issue #673 · QubesOS/qubes-issues · GitHub. So I guess unless patches are sent to Xen or Qubes, nothing will happen.

To put it into perspective, these seems all “detail enhancements”. To my knowledge, nobody with related technical credentials accusing Qubes (or Linux) of having serious issues with entropy.


created just now to ask Stephan Mueller:

6 Likes

(terminology: entropy source == the code that reads data from a noise source, digitizes it, conditions it and delivers it to a DRNG; noise source == the physical phenomenon delivering entropy and sampled by the entropy source implementation)

The “Linus Jitter Dance” as you call it is seemingly a poor entropy source choice. I have analyzed it in [1] sections 6.3.2 and 6.3.4 where it is analyzed in both, virtual and native hardware environments. Thus, I would not want to rely on it for good entropy.

On 1: It is correct that the reseeding intends to recover compromising of the RNG. You see that in various standards which require a reseeding only in outrageous large reseed intervals (e.g. SP800-90A requires a reseeding after 248 generate requests where each can be up to 219 bits in size). That said, when you use getrandom() without the INSECURE flag or /dev/random (not /dev/urandom), the kernel ensures that the internal state received 256 bits of entropy. That may come from the (poor) Linux Jitter Dance as well as from interrupts (which are considered good even in early boot as outlined in [1] section 6.3). Considering that quite some interrupts are generated during boot time, you can safely assume that a good deal of entropy is also coming from the interrupts during boot time, but you are not sure how much. Considering your statement that you seed/reseed during boot time from 4 entropy sources:

  • yet 2 are based on the same phenomenon (jitterentropy-rngd and havegd),

  • one from interrupts (albeit you do not know how much), and

  • the one remaining seemingly is not so good (see above),

you effectively have:

  • 1 true noise source that is guaranteed to deliver entropy after initial kernel boot, and

  • one noise source that may deliver some entropy.

Now the question is when user space wants to initialize a DRNG and thus require seed. Long running daemons (e.g. SSHd) instantiate their DRNG early in the user space boot and hardly reseed (depending on the choice of OpenSSL). Again, other DRNGs may instantiate time and again (e.g. SSH clients) which benefit from new entropy added to the CSRNG. Usually one wants to look at the worst cases when performing entropy assessment. The worst case therefore are the daemons that spawn their own DRNG seeding from the CSRNG during boot time. For those, you are right, only the CPU Jitter noise source is the entropy source you can count on.

Bottom line: I personally think the CPU Jitter is good as entropy source (it is even accepted in numerous FIPS 140 validations - see the list [2] where one third or more of the validations rest on the CPU Jitter RNG - either user land or kernel land).

On 2: Contemporary DRNGs such as SP800-90A DRBGs (which are the majority of all implementations these days: almost all user space libs use that standard) “add” new entropy data to the state, i.e. any existing entropy is “supplemented” with the new entropy. See [3] sections 10.1.1.3, 10.1.2.4, or 10.2.1.4 which all show that the reseeding operation also pulls in the existing state along with the new seed data to generate the new state. This also applies to the kernel’s random.c implementation as outlined in [1] section 3.3.2.1. Any DRNG that is found to “overwrite” the existing state must be considered a security flaw.

On 3: I cannot answer this one as I did not perform an analysis.

On 4: I cannot answer this one as I did not perform an analysis. Yet, please use /dev/random for this task. Also, ensure that any LUKS creation operation happens after the userspace rngd’s are started.

[1] https://www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/LinuxRNG/LinuxRNG_EN_V5_7.pdf

[2] Cryptographic Module Validation Program | CSRC

[3] I am not allowed to add more than 2 links - thus search for “SP800-90A rev 1” and you find a NIST web site where you can download the PDF specification

2 Likes

Thank you so much for taking the time to answer @smuellerDD, it is incredibly helpful.

Some follow-up questions:

The “Linus Jitter Dance” as you call it is seemingly a poor entropy source choice. I have analyzed it in [1] sections 6.3.2 and 6.3.4 where it is analyzed in both, virtual and native hardware environments. Thus, I would not want to rely on it for good entropy.

I personally think the CPU Jitter is good as entropy source

  1. To confirm, you are saying there is nothing wrong with CPU jitter in general, but the “Linus Jitter Dance” is not good (despite using jitter)?

I want to verify I understand your analysis of the effective noise sources. Sorry if I’ve misinterpreted something. You are saying we effectively have:

1 true noise source that is guaranteed to deliver entropy after initial kernel boot

2a. This is CPU jitter?

one noise source that may deliver some entropy.

2b. This is hardware interrupts?

2c. Might you have missed the entropy source, a “one-off read from dom0’s getrandom(0) of 512 bytes” from the original post? In case you aren’t totally familiar with Qubes, dom0 is separate to the virtual machines whose entropy quality is in question. Unlike the virtual machines, it has access to real hardware and processes keyboard/mouse events. Can you please confirm that this would be a third effective source?

  1. From your answers I’ve gathered that entropy never exits the system, reseeding is only for recovery from compromise and the kernel requires 256 bits of entropy for the CSRNG to be considered seeded. With this knowledge, I believe that once the one-off read from dom0’s getrandom(0) of 512 bytes enters the kernel’s input pool (via write to /dev/urandom) and a reseed occurs, the virtual machine will have sufficient entropy for as long as it’s powered on (assuming no comrpomise within the virtual machine, assuming dom0 itself is seeded with good entropy and igorning the issue that this entropy won’t end up in the system until relatively late). Do you agree?

I’ve got some comments for the Qubes developers, but I will save them until this is confirmed so that we can keep the thread clean.

On 1: The noise source (i.e. the phenomenon) CPU Jitter (i.e. miniscule differences in execution times of a fixed set of instructions, as well as differences in memory access times) is a good noise source IMHO considering the assessments performed on it. The CPU Jitter RNG (from chronox.de) is IMHO a good entropy source (full disclosure: it is my implementation, so take that statement under “biased assessment” :slight_smile: ). I think the havegd is also good, but I did not perform the final assessment on it.

The “Linus Jitter Dance” seems to be based on scheduling artefacts, but the entropy source implementation seems to exhibit a loop-sided distribution. I have also implemented a scheduler-based entropy source (see my LRNG implementation or the user space ESDM [1]) which do not exhibit a loop-sided distribution. Thus, it seems that the Linus Jitter Dance implementation has deficiencies, not the noise source of scheduling artefacts per se.

On 2a: here I refer to the jitterentropy-rngd and havegd you mention (and I assume are started very early in user space boot to insert data into /dev/random before other consumers come online).

On 2b: Yes, the interrupts - the “may” is due to the fact that the kernel wants to collect 256 bits of entropy. If the Linus Jitter Dance is active, it will deliver a part of the 256 heuristically assumed bits of entropy.

On 2c: I am aware of the purpose of dom0, but I am not entirely sure what the 512 bits from getrandom(0) are used for and when this happens (compared to the time the jitterentropy-rngd and havegd are started). If you can clarify the sequence, please. On the HID, they will certainly deliver entropy via its interrupts. But it is likely that they only start to deliver meaningful entropy after the system is fully booted (and perhaps after the relevant daemons are started or after you initially fed entropy to the guests). If this is the case, the HID-entropy does not contribute to those use cases. Only for use cases after the HID starts to deliver entropy (and at a worst case after 2 minutes due to the reseed threshold of 1 minute for the CSRNG primary and another 1 minute for the CSRNG secondary DRNG instance), the HID entropy is truly available - see the referenced BSI document section 3.3.2.1.

On 3: In general I agree. But note that guaranteeing access to “fresh” entropy would be wise as it makes sense to once in a while get fresh entropy. This is supports a healthy system if there is some entropy leak which we may have not identified (side channels and the like). Thus, your statement is true, but it would constitude the lowest security level.

[1] GitHub - smuellerDD/esdm: Entropy Source and DRNG Manager

3 Likes

I am aware of the purpose of dom0, but I am not entirely sure what the 512 bits from getrandom(0) are used for and when this happens (compared to the time the jitterentropy-rngd and havegd are started).

The 512 bits are written to /dev/urandom without crediting. As for when, jitterentropy-rngd, haveged and systemd-random-seed (responsible for writing dom0 bytes to /dev/urandom) are all run before sysinit.target. I also looked at random.c and it looks like the Linus Jitter Dance is only performed when there is a read from /dev/urandom prior to the CSRNG being seeded. I have no idea if it is common for things running prior to sysinit.target to use /dev/urandom. If we could somehow confirm that nothing reads from /dev/urandom (or even prevent it) prior to systemd-random-seed running then that would be very good. If we could also ensure that systemd-random-seed runs prior to jitterentropy-rngd and haveged, then that would give us good entropy guarantees. I’m sure the latter is possible, I don’t know about the former.

From your responses here and elsewhere, I believe Qubes does not currently have good entropy guarantees by default. Virtual machines launched early by dom0 could potentially get bad entropy from dom0 due to dom0 itself not being seeded well. The virtual machines themselves rely on systemd-random-seed which is not really that good, its own man page even says “it is recommended to use a boot loader that can pass an initial random seed to the kernel”.

However, because you confirmed entropy never exits the system, I believe it is possible to use Qubes safely by waiting a few minutes before launching virtual machines that require good entropy, and then waiting another few minutes before using /dev/random in those virtual machines. If you are using a service that is started early within a virtual machine, you should probably restart it after a few minutes if it is important that it is has access to good randomness.

I hope I’m wrong, but I think this is a case of most people not caring or simply being unaware because it is rather invisible.

Thanks @smuellerDD for going out of your way to help, you’ve made life so much easier.

Where can I find the code responsible for doing LUKS encryption during installation? I don’t like that cryptsetup uses /dev/urandom by default, so I want to verify that Qubes is using /dev/random as the LUKS entropy source, or that /dev/urandom is guaranteed to be initialized at the time cryptsetup is used.

I cannot answer this one as I did not perform an analysis. Yet, please use /dev/random for this task. Also, ensure that any LUKS creation operation happens after the userspace rngd’s are started.

This needs to be addressed. Can somebody ping a Qubes developer?

I notice that, at least on my Intel laptop, rdrand/rdseed are available in the AppVMs (cat /proc/cpuinfo | grep rand).

I don’t see anything that uses it to add entropy into the kernel pool, though. There are some daemons or kernel modules that can do it, but I’m not sure I see them being present.

On the other hand, I’m getting 150MB/s from /dev/random and /dev/urandom, so clearly there’s no blocking going on.

I don’t see anything that uses it to add entropy into the kernel pool, though.

It’s done by default in the kernel, you can see for yourself in random.c.

On the other hand, I’m getting 150MB/s from /dev/random and /dev/urandom, so clearly there’s no blocking going on.

Neither of those files will block after they have been seeded, the old blocking behavior doesn’t exist anymore.

Entropy sources aren’t the issue, the issue is ensuring that early processes don’t get bad entropy.

Per the previous discussion, Qubes has room for improvement with entropy. I’ve identified it, analyzed it and found an expert to confirm it (thanks Stephan). However, I can’t create a GitHub account because it is blocking Tor. Can somebody please reopen Improve entropy collection in VMs · Issue #673 · QubesOS/qubes-issues · GitHub and link this discussion.

There is also the matter of cryptsetup and using /dev/urandom by default. I was unable to find where cryptsetup is used and I don’t have as much time on my hands as I did previously. Who are the Qubes developers on here that I can tag?

@unman sorry if this tag is unwelcome, I think you may be part of the development team or know who is.