Qubes Backup is Slow

It sounds like you may have lost the thread of the discussion. To recap:

  1. I claimed that it’s fine to store encrypted backups in cloud storage as well as locally.
  2. You said: “Depends… lets say there is activist or investigative journalist who did certain work already, hence is already on the cross-hairs, then it might not go well for that person […]”
  3. I asked what you meant by that: What is the additional threat to this person from storing an encrypted backup in cloud storage? Is it just that now the adversary knows she’s a Qubes user, or is there something more?
  4. You respond that simply using encryption and privacy tools is enough to put you on a list.

But remember that we’re assuming that this person is already “on a list” due to her published work, which threatens powerful people. That public work is already enough by itself to put her in their cross-hairs, regardless of her use of technology.

Now you’re gesturing vaguely toward the dangers of metadata in general. I’m well-aware of those dangers and certainly don’t deny them, but that doesn’t address the specific question I’m asking. I repeat: What exactly are the additional risks our hypothetical person incurs by storing encrypted backups in the cloud?

I think it’s also worth noting that some of your claims about metadata are too strong. For example, earlier you said:

Again, I’m well aware of the privacy-destroying power that can be wrought with metadata, but the unqualified claim that metadata has a higher value than content is dubious at best (especially when we don’t stop to ask “Valuable for whom?”). If I had to choose, I would rather keep the contents of my phone calls, emails, postal mail, medical records, and tax returns private than the metadata about that content. The supposition that metadata is always more valuable than content leads to absurd results: Encryption would have no reason to exist, since the encryption metadata would be more valuable than what it protects. Envelopes would be sealed inside of letters rather than the other way around.

And I think you assume too much.

I never claimed you do. AWS was simply an example of something that provides such control.

You just proved my point. Those are independent of Qubes.

That only prevents Qubes from deleting those backups. It doesn’t prevent those backups from being deleted. (Refer back to my example of the simple hard drive formatted with FAT32.)

I am kind of weak in data integrity checks, sha256sums etc. Can someone point me to some good guides of data verification and integrity etc.

When restoring backup VMs occasionally you have template VMs name changes such as fedora-30 to fedora-301. My question is: how would you delete the stock template installed during a fresh install. For example fedora-30 would be deleted, since I have fedora-301(which is the updated version). FYI, I verified all qubes that were based on fedora 30 and changed them to fedora 301 in qubes template manager. thanks,

In dom0:

sudo dnf remove qubes-template-fedora-30

Hey @tasket. How does one go about verifying your GPG key?

I’ve found none associated with your email.

Hi, I’m kind of late to the party, and must admit I only skimmed the thread.

Did you consider doing offline backups? Like shutting down the Qubes machine and just taking an image of the disk(s)?
With a deduplicating backup tool like restic (https://restic.net), this would not take up too much space.
If you have a 1TB disk, which is fully encrypted, you either have a first backup that takes 1TB (if you have actually written encrytped data to all sectors) or less (if only the used space looks like random data, and the not-yet-written sectors contain zeros. I actually don’t know if qubes does write all sectors during installation to hide information about data size…).
Every subsequent snapshot will only transfer sections that have changed, so the repository will only grow a little bit per snapshot. Of course it would have to read the whole disk again, but a simple sequential read (without decryption) of a disk is still faster than some of the time-constraint requirements that were mentioned, especially if the system disk is an SSD.

Due to the system being offline, there is also no attack vector to intercept information. The backup repository is trivially encrypted (because it’s an image of an encrypted drive).

I did not actually try it out yet for my qubes machine, but I’m using restic for backup of other machines here, and I’m planning on using it for qubes.

It would of course be nicer if I could use restic for single-vm backups, so a complete read of the disk can be skipped.

Interesting approach. Let me know how it goes!

I just did a test-run. I attached the 1TB SSD on which qubes is installed to a S-ATA to USB3 cable and to another machine. Then used restic to store a complete disk image. As I wrote the image to an NVMe SSD, this was really fast, it took about 42 minutes. The resulting restic repository had a size of 121GB, which means that all the not-yet-written-to areas of the qubes disk contained zeros and were deduplicated by restic.

A subsequent second snapshot (without booting up qubes inbetween, thus identical data) also took 42 minutes, and the repository stayed at 121GB.

The first snapshot would of course take a lot longer if the backup target was a HDD or some network drive.
But thanks to deduplication, every subsequent backup (as long as the amount of changed data is not huge) should roughly take those 42 min, as most of the time will be spent reading areas that have not changed and only writing tiny amounts of metadata. The limiting factor then is the sequential read speed of your disk, which hopefully is an SSD when running qubes :slight_smile:

This way you can incrementally back up your whole system in a fixed amount of time. You could also play around with partitioning schemes or multiple disks to have a “split” backup: One smaller disk/partition that contains qubes with its VMs, which is then faster to backup via image, and another disk which keeps your data and will use file-based backup, as to not always read all of the disk.
Judging from my resulting repository, I could easily get away with a 250GB SSD for the system, which then would take about 10-15 minutes to image.

EDIT: Ha, now I know what to do with that spare 250GB SSD I have lying around :smiley: brb, installing qubes…

EDIT 2: restoring a snapshot is slower, reading from the repository is at about 120MB/s. But you hopefully don’t have to do that too frequently.

2 Likes

Wow. This is fast! My full 400GB backup takes around 6h-8h, I think!

Optimizing for the most common operations is great, even if that’s at the cost of “slowing down” non-common operations. So this is ideal, I think.

The only “deal-breaker” for some, may be the need to take it out. But I really like your solution overall. I’ll have to play around with it.

There is no need to take it out. You could instead boot a USB live system (or do a network boot) to backup the installed SSD to an external device. I did it this way for testing because it was easier for me (i.e. I was too lazy to find a USB stick to boot from :smiley:)

My thinkpads have a BIOS option to automatically boot from network if they were powered on via Wake-On-Lan. So I could imagine fully unattended nightly backups where a backup server wakes the machine, lets it boot an auto-backup-tool via PXE, which shuts down the machine after successful backup.
If such a backup server makes sense under some of the threat models that qubes addresses, is of course left to your own choice.

That is less than 20MB/s! Damn… you could write a full disk image via USB2 or a good internet connection in that time.

1 Like

Around 200GB of frequently used machines and another 300G of infrequently used / changing machines.

The qubes backup system is extremely slow for this. Was using Wyng backup to take LVM snapshots but have now moved to btrfs without lvm for other reasons.

Think a combination of physical backups with partclone and logical backups with the Qubes Backup-Restore tool is a potential way forward but wonder what others are doing.