The threat of malware in repositories?

I hesitate to ask this, becuase I don’t have the technical knowledge to meaningfully participate, but its still seems important.

This article in ArsTechnica talks about malware being slipped into open NPM repositories. Apparently its becoming more common, with similar attacks on RubyGems and PyPi. I think I have seen at least one other mention of another instance somewhere else in the media recently.

I remember in the foundational documents of Qubes, someone (probably Joanna?) pointing out something like, the root of all trust in Qubes is in the OSes they use. For practical reasons, if you follow the chain all the way down, the buck stops there. So if someone busts open the house of Xen or Fedora, well… whatcha gonna you do?

That seems llike a fair point! But now it seems to be - possibly, potentially - happening. (Am I wrong?)

I don’t understand the process well enough and I might be mixing things up, but it does seem to me this is a serious concern to our community. I’ll draw out a quote from the article’s discussion:

Oz7 Ars Scholae Palatinae et Subscriptor

reply Dec 9, 2021 2:02 AM

  • Popular

Quote:

“People downloading open source packages should take extra care in making sure the item they’re downloading is legitimate and not malware masquerading as something legitimate.”

Would love a long form piece on ars (or insights from the ars crowd) as to how to do this if you’re a small-ish data science team. R’s CRAN seems to be policed well enough, but unclear if Conda or PyPi are to the same degree. Any Python repositories that are as policed as CRAN?

+79 (+79 / 0)

1282 posts | registered 11/17/2012

So two specific questions from this.

  1. as a Qubes user (or any user of any OS, really) how can we “take extra care” when adding bits and pieces onto our system?

  2. how does the Qubes OS team - “a small-ish…team” - check the massive amount of components that go into the Qubes system, a great majority come from open source repositories?

The article talks about paid services (Anaconda is given as an example) that screen their material. Does Qubes have the financial resources for something like this (and what about trust?), and does that need to be discussed? Or is it something that it can and maybe already does itself?

If I have completely misread this situation and its really not a problem, can you tell me why?

the old tile “malware in repositories” is too strong and is sound like there are actual malware in the repositories

1 Like

just realized that as I saw your edit. Good point, sorry.

Although, that’s the article - there is (or has recetnly been) in some, the NPM repositories. I don’t actually know what NPM is, but I got hits when I searched so I guess its relevant.

Node package manager, it’s basically a collection of JavaScript libraries that developers download and use on Node.js to import necessary dependencies for their servers. (the theoretical attacks would then be making malicious libraries and slipping malicious JavaScript or leaving a backdoor in it and hoping that nobody reads the NPM repo code, but nobody has the time to audit every single package you download)

Well, of course there is a threat vector here, but what can we do about it? Malware in “the repository” actually is no problem, the problem appears once you download the infected application. The big repos, like debian, fedora and such actually do a light check, making sure there is no obvious malware in it. But a backdoor can be something as simple as one misplaced bracket, which is kinda hard to catch if you’re just having a glance at the code. So sure, there is a certain threat. But that’s mostly for small, unknown projects. The big players have a LOT of contributors, sometimes even up to thousands of programmers (linux kernel) and a bunch of security professionals are checking the commits. So the big projects are fairly safe (of course that’s just being free from malware, fixing security bugs is another story but “we don’t break userspace. period.”, so that’s something to worry about). Thus I’d say you don’t have to panic if you stick to known projects (indicators are a lot of contributors, active commit log and community) and install only software that’s absolutely necessary (which is also something that you should do on linux, windows and mac) and you’ll be rather safe form at least the low-skill actors (it’s pretty obvious that such safeguards won’t really work against the NSA but that’s a whole different story - keyword being the term “0-day” - and you can’t really guard against that).

Audit all the source code yourself, compile all packages from the source code you audited, and manually install only those packages. Needless to say, this is impractical for almost every person on earth.

In short, they don’t. They minimize the TCB as much as they can and check that as well as they can. They don’t audit every line of source code that goes into Qubes OS, because that would be several large operating systems (Fedora, Debian, Whonix, etc.) plus an industrial-strength hypervisor (Xen). It would be impossible even for one of the teams of those comparatively enormous upstream projects to do it, let alone the comparatively tiny Qubes OS team.

Thankfully, it’s not necessary, because the TCB is vastly smaller than this. Remember, the whole point of Qubes is to assume that a lot of software will be compromised and to limit the damage by securely compartmentalizing your digital life. This radically cuts down on how much of the software running on your machine needs to be trustworthy. In the real world, perfect security is impossible, so we aim for reasonable security instead.

1 Like

Thanks for that, everybody. Very helpful.

I think one should also mention the reproducible builds, which allow to verify that the executable is produced from the known source code. You won’t have to compile it yourself then, “just” audit the source code. You can also rely on the community, where everyone audits a tiny part of the source code.

1 Like

Good point. I momentarily forgot about reproducible builds. They go a long way toward solving (or at least alleviating) this problem.

You no longer have to do either one yourself, as long as you’re willing to delegate the tasks to others.

Right, or an independent set of expert auditors, or anyone else you’re willing to trust. Reproducible builds basically allow trust to “flow through” the build process. Before reproducible builds, we had no way of knowing whether a given binary was really compiled from its alleged source code, so we couldn’t reliably infer any properties about the binary from known (or at least believed) properties of the source code. After reproducible builds, that new inference path is available to us (so long as there are enough active verifiers around whom we trust).

1 Like

In case you’re wondering, I’m pretty sure TCB stands for “trusted code base” (all the pieces of code you need to trust in order to be able to trust your entire system, in other words: the parts of the system that are security-critical).

Please correct me if I’m wrong @adw! : )


Edit: the term was “Trusted Computing Base”, see @fsflover correction below. (Thanks!)

Trusted Computing Base.

1 Like