How to store lots of data (RAID) in Qubes?

Hello!
I use Qubes for around a week now on a laptop and was hoping some of you experts out there could help me.

My main computer does a lot of things and it stores lots of TBs (>20 TB) of data (mostly media files). There is so much data on there that I need to have RAID and OpenZFS serves me well in that regard. Now, I might be a bit spoiled, but how would you implement / have you implemented the following things in QubesOS?

  • Hosting server applications with access to a subset of the data (for example, mpd only needs to read my music collection, jellyfin needs to read music and videos and so on).
    • As far as I understood it, I would need a single qube to handle all media files and incoming connections, since exploiting one qube that has access to files on another means both are compromised (correct?).
      • That would pretty much break compartmentalization for me, since I have lots of different small things accessing lots of data like this. Is there a general way to share files without duplicating them (I found this github repo but it’s based on rsync, which would probably mean I would copy over my entire collection to different qubes)
  • Incremental Replication and Backup (I found wyng for this, haven’t played around with it that much, but seems to be pretty much what I am looking for)
  • Detection and Mitigation of Bit rotting (this point is especially troublesome to me, since it’s a problem that ZFS solves so elegantly). mdadm and dm-integrity?

My totally and utterly unqualified approach would be

  • compiling OpenZFS as rpm packages in a Fedora 32 qube, copying the files to dom0 and installing them there.
    • I read here that there is more to kernels than just the one currently running, so I expect this to create some headaches…
  • creating a zpool in dom0 (direct access to disks)
  • making some gigantic zvols (block devices) with underlying OpenZFS RAID
  • creating a PV on the zvols
  • letting Qubes/wyng do it’s thing with LVM snapshot magic.

I read here that Qubes does not support OpenZFS - I would assume, based on GitHub activity, that I shouldn’t hold my breath for support any time soon.

Is there a recommended approach to this, or would this be the “ideal” way until OpenZFS is fully supported?

Note: I don’t need OpenZFS, I would be fine with switching to another FS entirely, since I probably won’t be able to have replication anyways - dom0 has no network after all.

Thanks for reading this far, I’d be grateful for any comments or suggestion, even just a simple RTFM with a link.

Take my course…
Show me what you got ….

1 Like

My main issue is actually getting that plan, since I don’t really know “the Qubes way” (If there even is such a thing).
I am currently trying to tackle the storage situation, but since you provided a solution for my web applications, I’ll test that first.

I never heard of rancherOS, but it seems like something that you use for container orchestration.
Just to be clear - you are using rancherOS in a StandaloneVM and run all web-applications from there? Wouldn’t that mean that all hosted data is stored inside that single Standalone VM and exploiting any of them would cause the entire qube to be compromised?

I know what a reverse proxy is and have worked with traefik before. I also use docker for my web applications right now, though I never used Kubernetes.
I’m guessing you recommend a reverse proxy to minimize attack surface on the opened port?

Also, I’ll read up on “unikernels”, as you suggested.

Thanks a lot for the encouragement and insight!

1 Like

My storage is 1 TB NVMe for qubes, 3 TB HDD/SATA attached to dom0 for backup, and 18 TB raid 5 attached to qubes with NFS.

Don’t know what your threat model is, but I don’t mind using a NAS for anything that isn’t personal/sensitive data. I’m using a Synology NAS which doesn’t have a lot of FS options, but if you use TrueNAS I think you can use ZFS and OpenZFS.

1 Like

SSBnb3QgeW91IQ==

1 Like

The “Qubes way” is to have the data in the Qubes/VMs where you need it.
If you find that you regularly need the same data in multiple VMs, you probably segregated too much.

See [1].

If there’s technical reasons that Qubes start slowly due to too much data inside them (I heard sth like that, not sure if it’s true), such bugs should be fixed upstream.
In the meantime you can attach large data blocks to VMs via qvm-block. Personally I maintain [2] to automate that.

The “Qubes way” to do backups currently doesn’t support snapshots, but of course you can use whatever backend you like if you maintain the aforementioned data blocks per VM anyway.

[1] How to organize your qubes | Qubes OS
[2] GitHub - 3hhh/qcrypt: multilayer encryption tool for Qubes OS

2 Likes

Yeah, I should probably start by removing some services or seperating my data better to get this stuff under control. I’ll be building a proof of concept on my laptop, so I can check if this would be feasible.

Okay, I’ll admit that I was a bit overwelmed by that README. I’ll definitely take a look and play around with it to get more familiar. Also added the qvm-block command to my todo-list, many thanks!

Apologies, I should have probably stated my threat model in my original post.
First of all, I am probably not important enough for anyone to care about me, so this is actually just a little hobby project to get more familiar with both QubesOS and - by extension - security best practices.

I actually have two different models, one that I would assume to be the worst case scenario, and a more realistic scenario (which I will implement first, since it’s easier to do).

I would like to work under the assumption that my machine will be under attack from the local network, meaning the adversary has physical access to my networking equipment and can spoof any IP-Address in my LAN.
I would probably ultimately trust some USB PGP devices to verify my boot-partition using anti evil-maid (haven’t read up on that though). That device would always be carried with me.

That means any network traffic not cryptographically verified (Using OpenVPN, Wireguard, IPSec or similar) should not be trusted and I would like to prevent access to my computer as reasonably as possible.

For the time being, I would like to trust that there are not attacks originating from my LAN, at least until I get the rest of the system up and running in a reasonable state. After that, I could probably look into all of the networking stuff.

Since my NAS is also a backup target, I would put all my eggs into one basket by having both primary and backup drives in there, so I would have to build an entirely new PC just for storage.

NFS is plain text (or with a lot of effort you can use KRB5 encryption), but that seems a bit overkill to implement in this scenario. You could also encrypt traffic between the NAS and Computer using VPN, but with Gigabit speeds, this would probably lead to a performance penalty (I don’t expect native performance out of this computer, of course).

If possible, I would keep it “simple” and try to have my drives in my own computer.

Right, will look into it and report back - thanks for the pointer in the right direction!

I have a similar situation to yours.

I have two pools…one is for music, the other holds all my other stuff in encrypted Veracrypt Containers.

I have a qube that mounts the music pool, and other qubes to deal with the other things.

If you want to compartmentalize, create more pools on the other end. Each pool can be mounted multiple times by different VMs. And, if you do NOT connect those VMs to the internet but ONLY to your NAS, you should be safe. (If your NAS is on the internet, you’re likely already compromised anyway.)

1 Like

You can attach the drives to dom0 and use mdamd, I’m using it for raid1 in dom0 and I never had any issues with it.

Shutting down domain takes a long time, but read that thread before you make a 20 TB volume, large volumes can be problematic.

1 Like

Hello!
First up, sorry that I’m late with replying to everyone (thank you all so much for your suggestions!).
I installed qubes on my computer this week, and there is just so much stuff to learn and document that I’m afraid I may run a bit late here and there. All of you provided so much detailed and insightful information and I would like to try out and look at everything you posted.
I will document my insights, findings and other stuff in this thread later on.
Just know that I found a solution that works quite well for my workflow, thanks to the nudges in the right direction you gave me.