QubesOS Python Library Dependency Integrity

janglingquo_575 · June 4, 2024, 5:16pm

I noticed that QubesOS has many python library dependencies. Is there any kind

of vetting of these Python library dependencies for vulnerabilities prior to inclusion

into QubesOS?

Much of the QubesAdmin API is coded in Python, but there is no information

on the integrity of Python library dependencies.

marmarek · June 4, 2024, 10:32pm

Generally, we try to use minimal external dependencies (Python standard library is quite extensive), and even if some, we require it to be available as distribution packages. We do not fetch random stuff from pypi or similar.

janglingquo_575 · June 4, 2024, 10:42pm

So the Python packages outside of the Python standard library are part of existing Linux

distributions?

How well vetted is the Python standard library?

marmarek · June 4, 2024, 10:55pm

Mostly yes. There are very few exceptions where we package them ourselves (for example GitHub - QubesOS/qubes-python-qasync - needed by Qubes Manager), where we check the library manually.

Python standard library (part of the Python project) has very high standards.

janglingquo_575 · June 4, 2024, 10:57pm

Can you explain how these library dependencies are checked for vulnerabilities? Are they scanned? Are they fuzzed after being compiled?

marmarek · June 4, 2024, 11:12pm

As I said already, by manually reading the code.

janglingquo_575 · June 4, 2024, 11:15pm

Who is actually reading the code, you?

marmarek · June 4, 2024, 11:16pm

One of core developers, sometimes it’s me.

janglingquo_575 · June 4, 2024, 11:17pm

Is there anything like reproducible builds available for confirming this claim?

marmarek · June 4, 2024, 11:21pm

This is not what reproducible are used for. Here we are talking about source code commit, not building it. If you look at the repository I linked before, you can find there is actually source code hash file committed to ensure its integrity, and it’s covered by signed tag, like all of the code in qubes repos.

janglingquo_575 · June 4, 2024, 11:27pm

so basically there is no mathematical means of proving that the manual scanning

of the source code of the python library dependencies is actually occurring. We are

required to assume the trustworthiness of the developers.

I’m sort of fishing for something that can be verified mathematically that there

is verifiably secure code in these library dependencies.

Does each line of code return a hash value?

marmarek · June 4, 2024, 11:42pm

Signature proves who made the change and committed it. Whether you trust that person/organization is up to you. If you choose to not trust particular developer(s), you are free to review/verify/scan/whatever particular code yourself, or simply not use it. More on this here: Verifying signatures | Qubes OS

Independent reviewer can provide their own set of tags, but that needs to happen in their forked repository (or using some other method) - we do not allow just anybody pushing tags to the official repositories.

The hash covers the whole source archive, which means any change to the source would result in a different hash. Read more at Cryptographic hash function - Wikipedia

janglingquo_575 · June 4, 2024, 11:49pm

What about fuzzing of these API functions. Is there any kind of fuzz testing?

marmarek · June 5, 2024, 12:00am

We do use fuzzing where it makes sense (especially where some API function is exposed to untrusted data), mostly using oss-fuzz. It doesn’t make sense for the library I used as a example, as it’s a GUI integration (Qt) that does not touch untrusted data and doesn’t really expose external API (and also GUI fuzzing is not really an effective technique).

janglingquo_575 · June 5, 2024, 12:02am

doesn’t a GUI library interact with graphics APIs, and potentially the screen/keyboard/mouse data?

marmarek · June 5, 2024, 12:23am

It interacts with the user (via various APIs, but not crossing trust boundary), not with other (untrusted) VMs.

Anyway, I feel like this discussion is getting more and more abstract and offtopic. If you want to ask some specific question about any of the processes, please make it specific. I believe I answered your original question already. If you want to propose some improvement to the process, you are also free to do so, but remember Qubes OS is a free and open source project that is built by a community.

janglingquo_575 · June 5, 2024, 1:02am

I appreciate your time, does the OSS-fuzz tool produce any kind of report or hash?

adw · June 5, 2024, 4:08am

It sounds like you’re asking for mathematical proof that a human performed some cognitive labor. That doesn’t really make sense, at least not without a lot of further specification about what you mean.
Even if you had mathematical proof that a human performed a manual scan of the source code, it wouldn’t necessarily be of any value to you. I, knowing nothing about the source code, can perform a manual scan and report: “Source code exists.” Suppose I could also somehow give you mathematical proof that this occurred. This is not useful to you, because the existence of the source code was not in question. What you need is assurance not that the some manual scan was performed but, presumably, that certain desirable security properties are present or certain undesirable flaws are absent.

Even if you had what you’re asking for, you’d still have to trust the developers, because they’re the ones who would be performing the manual scans that you want to mathematically prove occurred.

It sounds like you’re looking for formal verification. If that’s what you’re expecting, then I think your expectations are off. It’s sort of like going to the car dealership down the street and asking which models can fly. Is it possible to build a flying car? Sure, but it’s not common or practical for the cars any normal person can actually purchase and drive today. Perhaps someday it will be.

janglingquo_575 · June 5, 2024, 5:19am

I guess the correct term would be static analysis of the source code, which

is just identifying common vulnerabilities.

The way to really discover them is to throw every different type of data

imaginable at the function to see how it behaves (which is what fuzzing does).

Formal verification of source code involves converting the language into a formally

verifiable math language like “Coq”:

janglingquo_575 · June 5, 2024, 3:39pm

Is it possible to use an LLM (Large Language Model) to convert from one programming

language into a formal language like “coq”?