Storage and backup of large amounts of data

Let’s assume the following structure:

  • Personal VM (offline)
    • Audio (500GB)
    • Video (500GB)
    • Photos (500GB)
    • Documents (500GB)
  • Vault VM (offline)
    • Keepass DB (1MB)
  • PhoneBackup VM (from phone via Syncthing to VM over local network, restricted network access)
    • Phone Backup (50 GB)
  • Backup VM (offline)
    • [used for Qubes Backup]

Questions:

  1. How would you store the Audio/Video/Photo/Document data?
    a. On an encrypted partition which is mounted to /home of the Personal VM?
    b. Directly inside the Personal VM storage (which is stored on the Qubes partition)?
  2. I tend trust all Audio/Video/Photo/Document data equally well? Would you still seperate it to different VMs? My fear of further separation is that it would make the backup even more complex.
  3. How would you do a weekly full backup of the data to an external drive?

I’m thinking about writing a Dom0 script, triggering the following steps:

  1. [User connects (encrypted) external harrdrive]
  2. Attach the external harddrive to the Personal VM → run Rsync to sync only what has changed (and not the complete 2TB all the time)
  3. Attach the external harddrive to the Personal VM (and mount it as e.g. /home)
  4. Copy the Keepass DB from the VaultVM to the Backup VM (because I want it directly on the harddrive and not packaged inside a Qubes Backup)
  5. Trigger Qubes Backup, which includes the data of PhoneBackup VM and all my VMs (but not the large amount of data of the PersonalVM)
  6. Unmount the external harddrive

I currently see the following drawbacks:

  • I have to enter the password of the external harddrive twice (in step 1 and 2)
  • Copying data from the Vault VM to the Backup VM cannot be fully automated because the copy dialog will come up and ask me where to copy the data
  • The 50GB of PhoneBackup are always synced as a whole and not only the differences

Thanks in advance!

It’s very difficult to answer this sort of question without knowing
how you use that data, where it comes from, what you access, when you
access it, and how you access it.
There is never a solution that fits every use case, and I don’t know
what yours is.

I speak for myself: -
I would store the data in storage qubes, separate qubes for each medium.
Those qubes are only used for storage, and have few applications
installed. If you want to access a file you open it in an offline
disposable qube.
You can also sync the files from storage to a disposable qube - look at
GitHub - unman/qubes-sync: Simple syncin between qubes over qrexec for suggestions as to how you might
do this.
The point is that you are not trusting anything in the storage qubes.

I would rsync directly from the storage qubes to the encrypted
hard drive.

I would not use Qubes backup for the PhoneBackup - as you say it is
hugely inefficient to repeatedly backup that data. I would rsync that
also.

Otherwise, your proposed solution looks fine.

If you are scripting this, then there is no need to enter the password
twice - store it in a variable in dom0.
You can configure the qubes-rpc policy to allow transfers to a designated
qube without asking. This is not a default you would want all the time,
but you could make this part of the script - insert the “allow” line,
make the transfer, remove the line.

N.B this is not a “full backup”, which you ask about in Question3. It is a
series of incremental backups.
I would take a full backup at regular intervals to a fresh drive, and
archive the old.

I should say that this isn’t something I would recommend for most users:
the Qubes backup should be fine in most cases.
Any backup is better than none.

2 Likes

Thanks a lot for your very helpful suggestions.
In the meantime I had a look at your qubes-sync configuration and it looks very nice.

But how would you approach this with one client connecting to multiple servers?

  • Audio [storage qube with rsync server]
  • Video [storage qube with rsync server]
  • Photos [storage qube with rsync server]
  • Backup VM [storage qube which connects as client to all rsync servers]

My best guess is that you would use different ports to be able to sync with different rsync servers:

rsync --port=837 localhost::shared #sync with Audio
rsync --port=1837 localhost::shared #sync with Video
rsync --port=2837 localhost::shared #sync with Photos
...

Also: Is my understanding correct that this rpc-based approach would be more secure compared to connecting the qube traffic via a firewall qube and restricting it to the rsync port in one direction? I guess rpc might give a smaller attack surface compared to opening up the network stack - even if it’s restricted by the firewall.

Thanks!

Yes, that looks about right - have each service addressable on
a separate port, and use a different policy for each connection.

The advantage is that it allows you to keep the client completely
offline, as you say.

1 Like

I’m going your route for backups. But won’t do any sort of RPC cross-qube networking. Just rsync to external drive as needed.

However I was curious do you create multiple dispvm templates for each medium or do you install all apps to fedora-32-dvm template for example? I was starting to go through the process of creating seperate dispvms per medium following this guide: How to use disposables | Qubes OS

There is some files that would need online access and others I’d like to not be possible to have network. Beyond that I like the idea of separating photoshop stuff from audio editing stuff and CAD, etc.

I tried to make a new dispvm template using the following command but says command doesn’t exist?

qvm-prefs <vmname> template_for_dispvms True

But get error:

qvm-prefs: error: no such property: ‘template_for_dispvms’

I’m hoping this is not another of those things removed from 4.0.4.

I create a separate template (and therefore disposable template) for
each medium. They are based on minimal templates, and salted.

That command has not been removed in 4.1, but the error message is
unhelpful.
Are you perhaps trying to use a template as disposable template?
That wont work, and will generate the error message you have seen.
You need to set an appVM as disposable template.

1 Like