Bees and brtfs deduplication

Insurgo · April 15, 2024, 6:56pm

Houla.

bees doesn’t support config file based option parsing, bees accept arguments that beesd doesn’t… On first impression, generating correct configuration files and then generating proper, dynamic, systemd bees calls to limit load average and other things… is a thing on its own.

I will continue to test this, but looking at the state of bees and readiness for being used in downstream projects seems to require a lot of downstream gluing/plumbing I was not expecting to be unfixed upstream and ready to use.

Commented at RFE: Configuration files · Issue #54 · Zygo/bees · GitHub

Seems like I am going to lower prioritize this and my interest to openzfs just increased once more

Discoveries:

QubesOS on brtfs compresses. beestats.txt is updated once an hour as documented in issue not in doc at besstats.txt not updating · Issue #178 · Zygo/bees · GitHub
- That stats informs us of bees hash table occupancy, compressed/uncompressed fs ratio, pagesize distribution so that one can tweak bees hash table… After the fact unless we increase dom0 reserved memory which 4gb is already a lot for systems having 16gb only. This is a problem. We need dynamic config.
- bees/docs/config.md at 28ee2ae1a88c811e2e5faae6b40ef63a48324a5d · Zygo/bees · GitHub gives insights on how to calculate hash table size for general purposes OSes, which don’t apply to QubesOS with snapshots, rotations, compression and raw images directly. Tweaking is necessary at time of configuration file generation based on the btrfs partition size and some heuristics to determine expected data uniqueness. QubesOS use case here means a lot of data redundancy through snapshots and cloning (which is why we are interested in bees) while setting the right thing first is not so straightforward. See my script to see where I am into that process.

Some stats

I installed Q4.2.1 over brtfs, not choosing Fedora templates this time because of other testing I intend to do in the purpose of this PoC. That is, deploying qusal, which clones and specialize minimal templates and deploy sys-cache to download updates once and install from local cache against multiple specialized clones. That’s it, got bored of fedora once and for all not being compatible with apt-cache and decided to never look back unless I really have to. Those issues are simply never fixed upstream really and workaround under cacher/qusal are always to be updated because checksums failing and templates failing even update chekcs therefore no available package updates are making their way into dom0 widget.
- So there is not so much expected gains on such not-cloned deployement as a start: the 39gb deployed templates got reduced to 37gb, but it took multiple hours for bees to parse all that data BEFORE being able to dedup
  - bees talks about optimizations that are possible if btrfs btrfs subvolumes are active. This is not the case right now under QubesOS. We only have one subvolume which is the pool, therefore --scan-mode 0, which would help prioritize clone/snapshot rotation for dedup, cannot be used. See bees/docs/config.md at 28ee2ae1a88c811e2e5faae6b40ef63a48324a5d · Zygo/bees · GitHub
  - It took like 8 hours to parse 39Gb of rootfs with an x230 and a fast ssd drive (everything is under a single dom0 btrfs under brtfs as of now) to gain around 2gb of deduplication. Of course, more gains are to be expected after cloning happens.

Lessons learned

if bees was deployed at OS install prior of template deployment, the gains could theoritically be nearly instantenous, but that needs to be proven.
- depending on what the end user decides, the gains would still be minimal at install with either unfinished deduplication before end of install (not a problem) but not sure how we could justify such high CPU usage and extended installation time for such low instant gains. Some of debian/whonix overlaps, where whonix workststion over whonix gateway explains the gain observed here: 2gb. My stats gathering needs some more fu.

Impressions

All in all, i’m a bit disappointed by current UX of bees. I really thought last time I checked it (theoritically) that configuration files were supported (not just passing some options through config file while still needing to craft runtime tweaks arguments)
I already invested a lot of hours trying to generate configuration files that would permit to have proper baseline configurations generated, having understood, wrongly, that beesd needed to pass UUID of the filesystem to find corresponding config file and there, the configuration options, to realize that for what it seems, only the hash table size and directories can be configured there while the rest still needs to be on the command line passed to beesd
Continuing experiemtnations there made me realize as well that even some options that I was expecting to be able to pass to beesd to be passed to bees are not parsed from beesd… Basically, that would be a big collaboration upstream to arrive to not so much improvements unless QubesOS also integrates it at installation medium and changes as well subvolumes configurations.
- Why should bees parse dvm ephemeral disks while they could be in different subvolume not cared for by bees. In other words: why deduplicated something that is gonna be discarded anyway
- Why revert snapshots be cared for by bees dedup with same reasoning as above.
- Why bees would want to watch for dedup of the whole dom0 filesystem outside of the directory root related to appvms/templates disk states.
- All of which requiring way more changes…

Then if we switched to OpenZFS, which would not apply dedup AFTER (offline) like bees does, but would prevent blocks from being written to the fs at the first place.

Comparison from TLVM/BTRFS/ZFS should be done on that.

But as of now, my interest into bees just lowered… a lot.
But again, ZFS doing that live dedup will cost more then BTRFS, way more and exponentially depending of the disk size. Once again, users are not expected to consider their dom0 ram size reservation if they choose to have a big SSD drive nowadays, but going BTRFS/OpenZFS would chanfge that.

TLDR: as of now, bees configured to have around 150Mb of hash table permitted to properly dedup just the base OS installation, reducing 39GB->37GB, and those stats are not good enough to compare.

Next:
I would need to extract proper size reductions to give good comparative just in whonix templates reduction to be convincing here without going further and deploy qusal while beesd is stopped. And then compare after offline deduplication occured.

Facts repetition:
Here again lets remind the fact. Bees is offline dedup tool. Meaning that the duplicated data needs to be written first to disk to be deduped. This means a lot of unecessary IO and writes happening on the drive for nothing, being then written again to tell the space is free, to be rewritten again and again, meaning more IO… which OpenZFS would prevent altogether at the cost of more ram used in dom0.

Losses:
Older hardware, from this experience, would not gain much but space at the cost of a lot of slow operations to get that gain.

All in all, I think older hardware benefit today of all those bonuses by putting a big ssd drive in older hardware and reinstall without caring much about size consomption.

The gains on performance between brtfs/tlvm+ext4 is a different subject, and might only be perceivable on older hardware.

The gains of space from dedup on newer hardware might vs CPU cost might as well be unperceivable.
Redoing this test on newer hardware, where ssd<->pci<->ram overhead might not be seen, might as well explain why TLVM vs BRTFS perf tests were not seen, where in this forum, big gains were observed on t430, which is old hardware with maximal ssd speed never really being reached because pci speed and ram speed being lower then the drives.

The question is always: what are we testing.

So that is that folks. Will context switch and come back to this to deploy qusal and rerun pre-post tests.

Meanwhile if somebody could share here would be the proper baseline commands to compare things properly in pre-test/post-test so I can report current stats on consumed disk space on fresh install vs this test laptop, that would help having a proper trace to be useful maybe later.