Qvm-backup compression filter execution time and output size comparison

Hi,

I was curious to know how to improve the default backup tool. I saw gzip was CPU bound and thought at proposing pigz as a replacement, because it’s in fedora repository and doesn’t change the output format, so it’s still easy to decompress for unattended backup restore.

Then, I thought it would be nice to compare with xz because it’s already there, and I remembered reading about someone asking to use zstd, but it was dismissed because zstd isn’t always available in you need to restore your backups in emergency, so I added to the list for more fun.

My conclusion is it would be interesting to use pigz instead of gzip as a default compression filter, using 2 or 3 cores in its command line. This would only require Qubes OS to add this small package in the default installation. There is no drawback, and the output remains a gzip file.

Trivia: qvm-backup --compress-filter=params has an issue, params can’t have parameters despite tar --use-compress-program=params being compatible with parameters. I had to use a shell script with the command in it, like xz -T 5 --fast and used that script in --compress-filter=/my/script.

5 Likes

Others have made the same request Qubes 4.0 Backup VMs slooow (gzip) - #21 by luja

With 2 cores, there wasn’t much difference in performance, but maybe it’s more relevant now when people start getting system with much higher core counts.

On 2 cores, you either have the compression tool or the encryption tool that will fill your CPU usage, so there is just no room to make it faster, except using zstd that has been using almost nothing during my tests. I didn’t measure CPU usage because it would have required more work I didn’t want to do, but zstd was using 15% of one core at best.

I’ve been running the tests on a desktop Ryzen 5600X 6C/12T (smt=on).

1 Like

I usually use bzip2 when I want to save space.

If you have 12 or 24 cores …

I submitted a PR for automatic recognition of lzma, pigz, zstd, zstdmt for restore operation. If I have missed any other common mainstream popular algorithm which is usually available in all distros, please tell me.

If the compression filter is not installed on the target system, the restore operation will show a message, asking the user to install the missing package on the target system. This should also work for GUI restore operation.

BTW, the zstdmt is the multi-thread version of zstd by default which should have the reasonable performance on modern systems. For curious users who want to tweak the compression parameters (compression levels, number of threads used, …). I do not have the intention to submit a PR to add those parameters. Just create an bash wrapper in your ~/bin with the desired options. e.g.:

#!/bin/bash
/usr/bin/pigz -p 16 "$@"
6 Likes

Not in fedora repos AFAIK, but on my zfs servers this little thing improves things by quite a margin when replicating over ssh tunnels (and offers more recovery options than most standard compressors) …

1 Like

what does it bring more than xz -T which supports multiple threads, as both use lzma?

I have avoided xz ever since I read that:

https://www.nongnu.org/lzip/xz_inadequate.html

3 Likes

BTW, after (and if) the mentioned PR is approved and merged, I will submit a PR for qubes-manager repository to make these changes:

  1. Adding label icons of the qubes (because it is nicer IMO).
  2. Changing the Compress backup checkbox to a drop-down of the detected and available compression options.

The above changes are relatively easy. Icons is only one line of code:

4 Likes

Some update on this. The PR for making auxiliary compression filters available for Qube Backup GUI and backup profiles are merged and released for Qubes OS r4.3 testing. Here is a screenshot:

As soon as user installs any of xz, pigz, zstd packages, they will be available in compression filter drop-down menu. There will be no additional options added to Qubes Backup GUI (e.g. compression level). If anyone wants to fine-tune compression options, it is possible to create a dummy bash script of the same name at ~/bin and point it to the original binary with custom options. e.g.:

#!/bin/bash
# place this file as ~/bin/zstd and make it executable to have insane zstd compression

/usr/bin/zstd --ultra -22 "$@" 
5 Likes

Wow! Thanks a lot.

1 Like

Now we could have some debates on the future default compression filter for Qubes OS. Before proceeding forward, we should consider that there is no 100% right or wrong here.

Having said the above, compression speed of modern utilities which could take advantage of multiple CPU cores are not comparable to the current default slow gzip utility. Not even close. I believe it might be better to replace it with something better in Qubes OS r4.3. We have these open Github issues on this issue:

xz

  • To the best of my knowledge, xz was used as the default Debain & Archlinux package compression at some points.
  • We had an unfortunate embarrassing recent backdoor incident in xz. Although it was discovered early and was never came to stable branches of major distros (and Qubes OS).
    XZ Utils backdoor - Wikipedia

Z Standard (zstd)

  • This is included in Kernel from 2017 (which is a security domain with thorough reviews).
  • Fedora has been using it for package compression since 2019
  • The default vanila Fedora installation uses btrfs with zstd compression since Fedora 33
  • Archlinux switched from xz to zstd for package compression since 2020
  • Firefox and Chrome support it for content encoding since 2024
  • While the zstd library is usually included in most distros these days, the front package should be installed individually.
  • zstd package provides zstdmt (multi-thread) which could be easily used from the Qubes GUI backup menu if users wants to utilize multiple CPU cores.

Pigz (Parallel gzip)

  • To the best of my knowledge, pigz is gzip backward compatible.
  • It could not use multiple cores for decompression. But gzip algorithm decompression is relatively fast anyways.
1 Like

My choice will be …

Ok. To the best of my knowledge:

Plzip

  • This is essentially lzma/xz but supports parallel compression.
  • No parallel decompression.
  • Not available in Fedora default repositories but copr?

there is also:

Pbzip2

  • Parallel implementation of the bzip2
  • Fully compatible with bzip2
  • Available in Fedora default repositories
1 Like

It uses lzma, but is not xz (with lzma2). It’s more reliable and more efficient. The details are shown in the above posted link.

Ok. The major concern is availability in default repositories of dom0. I can not find it Qubes OS 4.3 (Fedora 41). What is the package name?

1 Like

It’s plzip under Debian, but under Fedora-41, it seems that only lzip and lziprecover are available. I will look into this further.

Edith: It’s true … the fedora package only provides lzip. What a shame.

1 Like

It’s in copr:

https://copr.fedorainfracloud.org/coprs/frsoftware/lzip-nongnu/packages/

1 Like