Incremental backup possibilities

fcorbelli · June 18, 2024, 12:42pm

OK, really interesting
It is a single source code, because I made it compile on “strange” things, like Haiku, Solaris, ESXi (yep, ESXi), arm-based QNAPs etc
Sometimes you do not have even make

The base64 encoded part is… the Windows SFX module (opensource too) this one zpaqfranz/ZSFX at main · fcorbelli/zpaqfranz · GitHub and the autotest module (aka a .zpaq file), this one zpaqfranz/AUTOTEST at main · fcorbelli/zpaqfranz · GitHub

As I suppose you can well understand an SFX module must necessarily be merged, as well as a binary to properly test the operation
Or you have to download it from the Internet, and indeed there is the relevant download command
But, generally, it is best to use opensource software
When you use any program it is quite normal to have binary blobs
Think of a library, for example, rather than a font and so on
I, in my own small way, explain in full how to generate these files, so that you can actually verify that this is true.
There’s even a build directive that eliminates these parts altogether (aka: stripping pieces of source code to make another .cpp) if you’re really paranoid.
After a couple of years of discussion with a senior Debian developer, I was able to leave them (because it is documented exactly what it is. It’s a part of Iliad, it’s not executable code, it’s just text)
If you don’t believe me, create a virtual machine, run a autotest, and extract the .zpaq file present with zpaq (not zpaqfranz) or Peazip or whatever… and you’ll see–the Iliad (in Italian, because I’m Italian)

Or just read the documentation

Cherry on the cake, a persuasion-filled readme instead of a factual summary.

Factual summary: the best software for backup/disaster recovery your ever seen (just joking)

If you want to suggest what to write I will be glad to change according to your help. English is not my first language, nor is it my second, nor is it my third

fcorbelli · June 18, 2024, 1:02pm

TLDR : zpaqfranz = no go
The strongest element of all is right at the beginning, in the doc. A (honest) statement from the developer line 23 makes it easy to say it’s a no go to build any reliable back-up system upon :
“EXPERIMENTAL BUILD: the source is a mess, strongly in development”

I actually try to tell it like it is, without selling snake oil. Also because … I don’t sell anything

I started reading parts of the code about memory management (aligned malloc and free strategies) to find possible leaks. Then about crypto and the extensive use of SHA1 to search for possible data encryption with weak algorithm. But the 100k lines are hard to navigate into and the code is complex (for someone new to it at least).

I assure you that it is much, but much, but MUCH more complex than you may think.
But the main work is not due to me, but to Dr. Mahoney, a true compression genius (even though he writes virtually indecipherable programs), the author of zpaq.
You can read here zpaq updates and here ZPAQ

If you think you can figure out how such a program works by reading its source you are probably a better programmer than I am
Way better, in fact

Another red flag is about some huge commits to add some feature (…)
This one I frankly did not understand

Finally, there seem to be some dynamic testing / integrity mecanisms but I don’t find static testing framework to check some program specs. I wouldn’t run an untested software of this size to manage my data.
I won’t dig deeper into zpaqfranz code

The software has been tested for more than 15 years.
And it has an internal operation control, it is the self-test command
Static testing is impossible for such a large target selection
The SPARC64 and big endian support alone takes me a lot of efforts

You may not know this but zpaq development starts about 2009 (maybe even earlier)

The key-element that zpaqfranz adds (in addition to a thousand other things) is a “everywhere” data integrity check

While zpaq does not maintain a hash (or rather a checksum) of the files, but only of their parts (they are called fragments) zpaqfranz adds a CRC-32 code of the whole file (+ a hash, which can be SHA2, SHA3, BLAKE3, even whirlpool)

During the testing phase (of the archive) zpaqfranz recalculates the CRC-32 of the whole file, allowing you to be (almost) virtually certain that the file is “well” compressed (and no SHA-1 collisions)

There are many other functions (e.g., the -paranoid switch, the w command, the verify command etc.) that all aim precisely at checking that the archive matches perfectly

It even implemented a different decompression module (!), the reference one by Mr. Mahoney, with the paranoid command (which, however, is monothread, aka slow).

There is support for zfs and even third-party software (hashdeep) to check that files restored by zpaqfranz are identical to those calculated with hashdeep
Normally this is a test I do on different operating systems, to have 100% safety, not 99.9999%

The main problem with zpaqfranz is maintaining backward compatibility.
That is, the files it creates are perfectly extractable with 8/10-year-old zpaq

It is indeed a common problem to use niche software whose developer disappears, and then you no longer know how to extract the data

With zpaqfranz the problem does not exist, you can always use zpaq

Incidentally there are zpaqfranz packages in opensuse, freebsd, openbsd, and arch
The one for Debian has been ready for three years (!) but the related sponsor typically works on it one or two days each year

I am very interested in any suggestions you may have on how to improve
Thank you

Rudd-O · October 8, 2024, 4:47pm

I’m quite frankly very happy with borg-offsite-backup (aside from the normal RAID1 every machine gets).

Yes, backups take about an hour for two terabytes of datasets that must be read (in the absolute worst case scenario), and yes, consistency of backups requires ZFS on the Qubes OS machine being backed up. The tool cannot back up LVM, and CoW files from the reflink storage plugin are modified in-flight with no hope for consistency.

But:

that is about 10X faster than standard Qubes OS backup, and it can be automated so backups actually get done,
the crash consistency being sufficient (for me) means that I can back up qubes that are online without too much fear of data loss (maybe I’ll lose 30 seconds of uncommitted data),
since the backups are consistent (enough for me) and they are in a content-addressable store, I can fire off a a very efficient rsync to get every single byte shipped offsite, with the confidence that nobody will tamper with them,
if yesterday’s backup is somehow bad, I have a whole fuckton of older backups, daily, weekly and monthly, to dig into,
it’s a single file, which just runs commands; if you understand a little Python and the Borg man page, you can run every command yourself (for disaster recovery).

I’ve made some updates to the README.md file of the project, if people are interested in looking at it again. If anyone wants to write a guide from the POV of a third party using the software, I would very happily publish the guide on my site.

rustybird · October 8, 2024, 5:03pm

If the file-reflink pool is hosted on Btrfs, you can get whole-pool consistency by backing up from a temporary snapshot of the subvolume containing the pool:

sudo btrfs subvolume snapshot -r / /new-snapshot-for-backup

Rudd-O · October 15, 2024, 11:30pm

yup. i have not added support for that yet, but i would welcome additions from contributors.