Backup strategies for Qubes OS

Because of this. It would be virtually impossible to find desired file/version inside the backups. Can we only imagine restoring (and decompressing if so) one backup after another, just to find “the chapter with the better word written that day, which day it was”.

Which is exactly what I wrote, thanks for pointing out that again.

This is how I backup data. Qubes taught me it is better for my security to use disposables. So, should I trust Libre Writer? No. So, I use it in disposable Vm. I’m writing a new chapter there and saved it. Now what? If If I shut down my dispVM, I’ll lose it. So, I copy it immediately to backup locations. Two of them at least. Do I need backup utility? No. So if that’s not “Now you’re thinking with Qubes”, then I didn’t choose proper OS for myself.

The Qubes OS backup tool is clearly not designed for you to use to it to make file level backups. It seems like you struggle to understand how to use backup software, which is probably why you are stuck on just manually copying files.

I haven’t tried wyng, but it can make incremental backups of qubes as far as I know. There is no reason why the official backup tool shouldn’t be able to do something similar.

I want to believe this is joke.

@tempmail

Your simple question:

I really don’t get it what is there to backup everyday, or a week. Please don’t generalize, just your clear use cases.

got answered. A few of the things I listed are (or have been) actual parts of my own use cases.

I don’t hear anyone screaming, so there is no need to scream back. Everyone is free to organize and optimize their workflow.

But the most insane would be incrementing a backup and loosing previous versions.

Yes, but that’s operator’s fault. It does not refute the concept of incremental backup, just like driving in the wrong lane and crashing does not mean roads are irrelevant.

So, this is actually what I’m trying to understand. You create some of the files listed there today. And then what? Everyday backup? What’s included in the backup? How you find it later? Where do you create those files? Where do you keep them out of backup?

Sorry for this metaphor if it insulted anyone. “Screaming” was my perception of many people mentioning incremental backup.

Of course. And some of us just presented that current backup is sufficient, while trying to point out there are probably more urgent things to improve for the daily use that actually don’t work (at least not well), like switching keyboard layouts and 3.5GB RAM patch, which are more important at the moment for regular daily driver Qubes deployment.

1 Like

@tempmail

So, this is actually what I’m trying to understand. You create some of the files listed there today. And then what? Everyday backup? What’s included in the backup?

Right now, I am still in the process of transferring my workflow from my Linux and (unfortunately) Windows systems to Qubes OS. I am saying this to clarify that I still don’t have any decent backup strategy for Qubes OS due to its lack of what I am used to. So, what I am used to (and what you ask for) is:

I work on different projects. Suppose today I modify/create some files. At the end of the workday (or before the beginning of the next one) I run an incremental backup which takes very little time and space as it saves only the changes from today. If I had to run a full backup (including files which I don’t need to backup), that would require storing a few TB of data every day with all the consequences.

Periodically I run a full backup (e.g. every month) and the next incremental ones are based on that one.

Re. the rest of your questions:

How you find it later?

Bacula (which I like so much and which I hope to see integrated in Qubes OS some day) has a database (Catalog) which stores information about all backup jobs, schedules, storage media. The data is browseable in a console app, so one can see the structure of the actual data and restore a file from a particular date.

Another incremental backup software which I use sometimes is rsnapsnot. It rsyncs new/modified files to an ext4 file system, so the result is a browseable FS tree.

FWIW, recently I learned that another software exists: rdiff-backup. IIUC, it is even more efficient as it copies only the differences between previous and current data, not the whole modified files. Bacula has a delta plugin which does that too but it is only in the enterprise version and I have no access to it.

Where do you create those files?

In the software which creates the content (text documents, images, code, etc). In the case of Qubes OS that would be also in different qubes.

Where do you keep them out of backup?

I don’t understand what you are asking. If you clarify, I will answer.

[…] And some of us just presented that current backup is sufficient, while trying to point out there are probably more urgent things to improve […]

The other thread did not ask “what do you want to prioritize” but “what would you like to see improved”, so I answered that.

I suspected this, honestly, that’s why I was active on this matter, among other reasons.

And i pointed this above. The very next thing after incrementing feature of Qubes backup would be indexing, if you noticed that. So I was right., I guess.
Which I said next wanted feature would be content/preview feature of the files in backup, because at some point you could realize that several versions ago, the idea was better. But, in which version it was, regardless you can browse backup?

I meant in what kind of VM? DispVm or some other?

Do you have any version (probably most recent one) out of backup? Where it is stored?

My point is that instead of putting files in backup, I put files on several internal and external locations intermediately after creation and closing app. That way I can browse all copies of my files efficiently without any additional software, and I have them backed up, since same copies exists at several locations.
And this was imposed to me because I use dispVMs exclusively. So when creating file in dispVM I have to store/copy it somewhere out of dispVM if I want to preserve it, right? So, why don’t copy it then to several locations immediately then, instead of one out-of-dispVM location, thus getting backup copies without backup utility? And it’s a matter of seconds to do that.

I have a single backup created with backup utility though… It’s the one of the full finally set Qubes configuration. It’s tested on the second internal disk, so it works. And before it, there were seeral iterations to reach it, while carefully and meticulously setting Qubes step by step to the point where I don’t have anything else to set/install.

I hope I made my backup strategy more clear now. which saves me immensely more time than if I’d use backup utility, less stress to find a version back in time, and above all, I do not depend on a utility that proves that messes peoples life not so rarely, as it can be read here on the forum and elsewhere (just search how they tried to restore the backup to no avail)/.

@tempmail

And i pointed this above. The very next thing after incrementing feature of Qubes backup would be indexing, if you noticed that. So I was right., I guess.

I did notice but that if Bacula is used and modified to integrate with Qubes OS, then there will be no such “next thing” because it already has it.

Which I said next wanted feature would be content/preview feature of the files in backup

Someone may want that. Not me.

But, in which version it was, regardless you can browse backup?

One could simply restore some version and check. When it comes to my own work, I don’t get easily lost in it.

I meant in what kind of VM? DispVm or some other?

Check the first sentence of my previous reply. That said, for extra security creation/editing and storing could be on separate VMs. The editing VM would be a minimal one and have only the software needed for editing the particular document type (e.g. spreadsheets). After editing the file, move it to a storage-only VM. Then backup that storage-VM.

Do you have any version (probably most recent one) out of backup? Where it is stored?

Normally - no. If I need to exclude a file/dir from backup, I put it in a (sub)folder which has proper name pattern which is defined for exclusion in the backup fileset (Bacula supports regex matching for include/exclude options).

My point is that instead of putting files in backup, I put files on several internal and external locations intermediately after creation and closing app. That way I can browse all copies of my files efficiently without any additional software, and I have them backed up, since same copies exists at several locations.

That wouldn’t be so trivial if you used tape backup.

And this was imposed to me because I use dispVMs exclusively. So when creating file in dispVM I have to store/copy it somewhere out of dispVM if I want to preserve it, right? So, why don’t copy it then to several locations immediately then, instead of one out-of-dispVM location, thus getting backup copies without backup utility? And it’s a matter of seconds to do that.

Being quick in a deliberately over-complicated workflow is the way to messing things up. Computers are made to make our work easier. Personally, I would rather spend my N*(few seconds) on actual work, rather than performing manually repeated operations which can be automated. It is much more efficient and non-error-prone to backup the whole work for the day, rather than take care to multi-copy each file after each editing.

In my experience with the older large scale computers. Incremental Backups can find Unique and Creative ways to fail, just when you try to do a restore.

Consider what happens if there is a problem with a small bit of data, or data structure that occurred several backups ago.

I would hope someone would write about the issue of restoring malware while doing a restore… And the value of doing a re-install with a data reload. And once again, how to prevent any malware from sneaking back in.

Yeah, I know , I also have a lot of settings and specialized qubes this strategy would lose. I would have to have a list of those on paper.

For several reasons, I would like to have a technical person write about powering down the computer versus: Closing the Lid. Putting the computer into sleep or hibernate. How much how often.

Linus Torvalds gave a strong statement about laptops should be built with ECC Memory. Error Correcting Memory also requires a MOBO to use it. Someone said that it would cost twenty percent more.

Memory Errors are more likely to occur with the number of hours since power up. Infiltrating file structure, and data accuracy. (From what I have been told)

So, when to power down and power up; Any technical person want to inform me? Us?

1 Like

This is a common misconception. It’s not true. In fact, one of the explicit design goals of the Qubes backup system is that you can always access your data in an emergency without a Qubes installation or any Qubes-specific tools:

2 Likes

I stand corrected, and furthermore, I say, GOOD!!!

1 Like

Just a comment on the existing backup system: The current format is a real PITA to use without Qubes - the scripts in the doco don’t work, writing lots of 100MB files has not been necessary since FAT! I found that these file are written ignoring errors. I have found empty and truncated 100MB files in a backup that was written to a flakey drive with no error reported at write time.

Sorry not to have time to diagnose in detail a log an issue.

The backup utility offers you to verify the backups you create. If you’re using faulty hardware there isn’t much more that can be done than giving you the tools to verify that the backups you create are effective, is it? What would have you expected exactly, an automatic verification step?

1 Like

They do work. I have successfully tested them myself.

I would actually like this and have requested it here:

2 Likes

These were issues from R4.0. Another problem with the backup verification facility is that it says it is just checking and not updating but it applies the meta data without telling you. I had a running qube flipped from a VPN to firewall without my noticing by doing a backup verification.

@gonzalo-bulnes my point was that when a file is written and it ends up with a known size different from what was written then the OS knows there was a problem and would have reported an error, that was ignored. I guess the 100MB files are to limit the disk space requirements of the backup, presuming it cannot all be done in a pipeline.

All I can say is your shell works differently from mine…

That’s been fixed in R4.2; the same fix for R4.1 is in the dom0 current-testing repository as part of the python3-qubesadmin-4.1.31 package

3 Likes

I have been using ZFS snapshots and zfs send -LR $zpool@$snapshot | zfs receive $external_zpool/backups to back up all the ZFS volumes (the zvols) used by other qubes. In one of the ZFS datasets I have salt for how I want Qubes OS and all the qubes configured. If something happened to my machine I should have almost everything I need to recover very quickly.

This approach to backups covers “datasets” and not individual files, but if I needed to recover an individual file I could clone an old enough snapshot of the dataset the file is still in.

The ZVOLs contain LUKS volumes so the encryption is “above” the zpool that is backed up. Some ZFS experts have pointed out to me with ZFS’ log-structured copy-on-write design that backing up incremental snapshots “under” the encryption layer isn’t the most efficient thing, and there does seem to be more data copied than necessary, but backups have still been pretty quick.

This is why I use ZFS on two disks mirrored with checksumming enabled (which is the default). One of my colleagues uses three disks mirrored. For more peace of mind I can run zpool scrub $pool and I do sometimes.

I am not sure if this should be mentioned here or in a separate topic (is related): back when I was starting using Qubes, I read the doco that said you can migrate a running system by taking a full backup, installing Qubes on a new machine, then restoring. I tried that and got to the screen where it asked what to install, thought do the minimal, but it said for experts only, so went ahead and did a normal install. The restore than created a duplicate set of qubes with 1 added the the name of each. Took ages to delete all the new qubes and unlink, rename and relink everything. Made mental note never to do that again…
So while testing R4.2 I thought I would try that again, this time selected expert mode, install no qubes and all was looking good, with system coming up with just dom0. Selected backup restore, no problem opening the file on a USB device, select all qubes to restore OK. Then every qube has a message like “-> Restoring fedora-38…” followed by a red error message “Error restoring VM fedora-38, skipping: Got empty response from qubesd. See journalctl in dom0 for details” Go to journal and there are 3500 lines from qubesd - python double faults with tracebacks, all Attribute errors, missing attribute or missing default. Finally a bunch of errors restoring dom0, unable access properties… Then “Extracting data: 7.0 MiB to restore” then a couple of minutes later in nice friendly green letters: “Finished successfully!”. Did I mention elsewhere that qubes ignores errors?

So what is the recommended procedure to actually do a simple full backup restore on a new system?

Personally, I would just:

  1. Install normally. (Don’t choose the “expert” option.)
  2. Delete the few default qubes I don’t want.
  3. Restore.
3 Likes