Incremental backup possibilities

Yes indeed, it’s probably not that surprising I prefer to read well-organized code XD

Thanks for pointing out it’s already elsewhere. Just found out it was not only about the repo from fcorbelli and this one seems already much more organized GitHub - zpaq/zpaq: ZPAQ's complete code history mirror

Not a security expert but would be happy to do an audit of zpaq or zpaqfranz if it has serious advantages over borg and could be a candidate for an incremental back-up implementation in Qubesos

1 Like

That’s a great offer and would be much appreciated.

Very practical solution imo.
Restoring Qubes is incidental if one has salted.
backup of all data,(at whatever timescale you want), restoration of Qubes
to specific date/times, and simple export of dated files to another system.

Explain what you find impractical?

indeed, it’s not that bad when you have salt everything

But how do you restore from such backup?

recreate the qubes, restores the files in each qubes from the repo.

My script uses the hostname to not mix everything in the repo, so if you recreate the qubes with the same names as before it’s easy to restore.

1 Like

TLDR : zpaqfranz = no go

The strongest element of all is right at the beginning, in the doc. A (honest) statement from the developer line 23 makes it easy to say it’s a no go to build any reliable back-up system upon :

“EXPERIMENTAL BUILD: the source is a mess, strongly in development”

I started reading parts of the code about memory management (aligned malloc and free strategies) to find possible leaks. Then about crypto and the extensive use of SHA1 to search for possible data encryption with weak algorithm. But the 100k lines are hard to navigate into and the code is complex (for someone new to it at least).

Another red flag is about some huge commits to add some feature, probably WIPs (ie. Add files via upload · fcorbelli/zpaqfranz@a56f3ed · GitHub and Add files via upload · fcorbelli/zpaqfranz@45ae46c · GitHub)

Finally, there seem to be some dynamic testing / integrity mecanisms but I don’t find static testing framework to check some program specs. I wouldn’t run an untested software of this size to manage my data.

I won’t dig deeper into zpaqfranz code

2 Likes

Yes, zpaq is indeed out of race against both. I used Restic in the past but didn’t know about Borg, the project itself and integrations looks great in many ways :slight_smile:

Borg being in Python / C fits probably better within Qubes ecosystem / community.

But used as binaries both are good candidates worth considering !

@leo provided specific instructions in his Salt tutorial on how to install an incremental back-up system with salt ; using the back-up tool Wyng written by @tasket. (Thanking you both btw).

0.3.15 is not marked as beta (lacking native encryption compared to >v4 but encryption strategies for v3 in Qubes are given in its readme)

I don’t know Tasket leitmotiv in writing a custom dedup backup tool, knowing the existence of Restic and Borg project. Some technical advantages have been exposed by himself here.

The exclusivity of this project is also himself also working on integration with Qubes ; we probably won’t find in other tools.

.

Could Restic and/or Borg or other more mature tools follow the same Qubes integration strategy as Wyng ? Could this be adapted to Leo’s back-up salt tutorial ?

4 Likes

Thanks for the mention!

The only hold-up with Wyng right now is accommodating Qubes different volume-naming schemes so that LVM can be exchanged for Btrfs/reflink if the user at some point requires it. Wyng was also just extended to handle multiple storage pools in a single session to make using wyng-util-qubes backup and restore a seamless process (and I am one of those users with multiple pools for my qubes).

Wyng was created to scratch my own itch for backing up Qubes safely and quickly, so restic and borg were among the many existing backup tools I researched to fill that need. I would have been happy to simply create my own shell or salt scripts for making backups with any of them if I knew the potential was there. Having experienced Time Machine in a FileVault configuration, where /home is mounted from a chunked disk image called a sparsebundle, I knew that a high level of efficiency could be achieved if 1) the local storage could instantly show large files as chunks, and which chunks were updated, and 2) the remote storage could hardlink chunks as needed to create efficient snapshot history and pruning capability (with the added bonus of deduplication as well). Wyng was designed to have TM-like usage, with brief backup sessions occurring hourly, for instance.

The problem with existing tools is they assume a local storage system with no CoW analysis features so they re-read all of the data in each file to find which chunks of the file have changed (while using fancy terminology to describe this mundane and wasteful process); otherwise, they may be like btrfs-send which are very efficient but assume you have the same type of filesystem on the remote end. The best combination of assumptions, IMO, is what Wyng does: to use CoW instant delta discovery on the local system and plain Unix fs on the backup destination.

From a security perspective, Wyng also has advantages. Unlike qvm-backup it doesn’t parse any un-verified tar archives or other metadata in dom0 (qvm-backup creates archives with two tar layers, the outer layer is un-verified when its processed)– in an OS security model where even reading partition tables in dom0 is considered risky, this is significant.

Wyng also doesn’t hand complex archival tasks to untrusted VMs nor to dispVMs that mount untrusted volumes, which is possible attack surface for causing DoS or worse compromise (and tends to be slow).

4 Likes

Thanks a lot for spawning this fast in this thread explaining about your motives !

To great surprise, it’s my finding too, for both tools discussed previously …

Restic and Borg doesn't seem to leverage CoW snapshots metadata

.

I finally went through the incremental backup GH thread where you and many other people (including Qubes team and core members and Borg lead) were debating on since 2015.

Also read many other sparse and rapidly-closing threads on the forum. They are so many of them, with some people having corrupted backups and so on. Even with someone who created guides for cron reminders to perform full back-ups :laughing:

The most interesting thread I have come across IMO is :

.

At the risk of saying something obvious : It is important (not to say crucial) for many users to have an easy to use disaster-recovery mecanism. Which Qubes doesn’t really provide as of now …

.

Came across interesting tools on the way :

How Wyng could compare to these approach, were most maintenance work would be held by upstream ?

Really don’t want to be rude here, just thinking one-man software maintenance is maybe a blocker for which Qubes team is not ready to rely on Wyng.

.

I didn’t find this statement (increased security than then existing tool) in the GH thread and, if this is true, could greatly help get adoption :

Are you sure there isn’t a signing mecanism that prevent unknown package parsing in existing Qubes backup tool ? Or a safe parsing function (like a safe_ftprintf, if you catch the ref :sweat_smile:) ?

Found this performance and security flaw report here (probably related to what you explained)

.

As far as I am concerned, my back-up habits have been screwed since I have been using Qubes as a daily driver. My almost Tb of data takes ages to back-up and with usb performance bottleneck, ages to be sent to an external drive ; so much that my last full system back-up has already 6 months …

Hopefully having other off-site strategies for critical data.

.

In the hope this particular thread will bring back discussion - may it be divergence of opinions - around incremental backup topic :slight_smile:

since GH threads became more silent after moderation

andrewdavidwong on Oct 20, 2022

Just a friendly reminder that this issue tracker (qubes-issues) is not intended to serve as a discussion venue. Instead, we’ve created a designated forum for discussion and support. (By contrast, the issue tracker is more of a technical tool intended to support our developers in their work.) Thank you for your understanding!

.

And please anyone feels free to also talk about your usage and other tools and strategies related to incremental back-up (Bacula, Duplicati and others). Or to link interesting and related discussion.

1 Like

IDK, the deal with Wyng maintenance seems good in that 1) its obviously not too complex for one person and its written in Python, 2) is only about 30% larger than the relatively simple qvm-backup, 3) has only two dependencies that go nowhere near PyPI or anything unusual (Linux distros carry them both), and 4) the on-disk archive format is easy to comprehend.

These qualities, in addition to Wyng being self-contained, also improve the disaster recovery aspect. There are no systemd services to install, no Qubes policy configs will be affected, no rsync-over-qrexec IP ports to configure, etc. –– Just fetch the wyng program file in the manner you prefer; on Qubes 4.2 one dependency will already be present, so if zstandard compression is needed just install ‘python3-zstd’ package, and connect your USB drive or ssh storage link. From there, sudo wyng is all you need to start retrieving volumes in Wyng archives. If you need to restore Qubes VM settings as well, the wyng-util-qubes wrapper will handle it (this currently works for lvm systems; reflink support is being added).

Yes, I’m sure. There is also a comment in backup.py pointing that out. New tar exploits don’t seem terribly common, but its a notable risk.

3 Likes

Is there a flag for calling a password file in either wyng-util-qubes or wyng? I’m not seeing it in the documentation, but maybe I missed it!

There’s no signing mechanism (@distopia), but there is a mechanism to defang the outer tar format layer of the Qubes OS backup file during restore:

Assuming that the backup file is being provided by a VM (i.e. it wasn’t manually copied into dom0!), restore.py will not deal with complex tar format data directly. Rather the VM has to convert it to the simpler qfile format (as used for inter-VM filecopy, or for receiving dom0 .rpm package updates) before streaming that to dom0, where the stream will be processed by qfile-dom0-unpacker instead of the tar utility.

1 Like

Wyng currently has two ways to automate passphrase input: pipe (it is always the first thing read on stdin with encrypted archives) and the --passcmd option that specifies a command that will output a passphrase. If you need to specify --passcmd from wyng-util-qubes you can use the wyng option passthrough -w like so:

wyng-util-qubes -w 'passcmd=command' -w authmin=10

Its a good idea to specify authmin as well to prevent repeated passphrase queries.


Edit: A note about reflink/Btrfs support since my last message may have been unclear: Wyng itself supports reflink storage already. Its is the wyng-util-qubes wrapper that currently assumes LVM and is being overhauled to allow for other storage types such as reflink.

5 Likes

Fair enough. I didnt recommend it - I said what I used and
deliberately cited other tools that could be used. I thought that was
clear.

zpaq is available in many distros and is a great tool to produce highly
compressed data in a standard format.

Incidentally the use of SHA-1 for checking file integrity is not unusual

  • compare with Git. You have misunderstood the purpose here -
    not surprising if you are coming at the code for the first time,
    although it is referenced in the documentation.
    zpaqfranz will identify and warn of SHA-1 collisions.
    You can use a wide variety of other hash algorithms if SHA-1 bothers
    you.
I never presume to speak for the Qubes team. When I comment in the Forum I speak for myself.
1 Like

Great, I am quite out of disadvantages to find using your tool, I will definitely soon have a try about it ! What kind of help do you need on Wynk ? Is it any close to native integration or replacement of existing Qubes backup ?

Thank you for pointing this out along with the backup codebase I hadn’t yet found ^^
I found out you were working on a shell script with a mention about btrfs backups.
@rustybird What is your use case for this tool ? Is it any close to integrate with Qubes to provide incremental backup feature ?

Yes it was clear, didn’t took it for a recommendation, I just reported to you my findings.
I wouldn’t recommend zpaqfranz for the job (for the reasons mentioned previously). But I think zpaq is still a good candidate here !

.

Just posted about this incremental backup topic to a proposal to conduct community-development focus on some Qubes component. You might be interested in.

Well, I have been using it for that. It’s very minimalist without any frills such as, uh, restore functionality. And there’s no documentation. But on the bright side, there’s not much code either.

1 Like

An update on Wyng and wyng-util-qubes:

The Btrfs/reflink support is now robust enough to restore LVM->Reflink and Reflink->LVM volumes automatically. This is the current ‘wip’ branch of each project which I expect will be going into beta within a week.

6 Likes

Have to say, of all the options to backup and restore VMs in their entirely, incrementally and with speed (remote or local), wyng is the way. Thanks @tasket

2 Likes