Bees and brtfs deduplication

Insurgo · August 23, 2023, 6:53am

I tried to build qubes to have it build qubes 4.2 and integrate bees spec file of opensuse tumbleweed but was unsuccessful just building qubes from qubes-builder last month.

It was failing at get sources step with issues with fetchrd commit from a submodule… Anyone here was successful recently?

I guess i could try only to build fedora-37 template and bees there. Poking here in advance to augment my chances of success!

Thanks

Insurgo · February 9, 2024, 6:49pm

Houla that was a long ride to arrive to this point (still needing help though, not building yet with no debug information to chew on).

I finally was able to use qusal to deploy qubes-builderv2 (Qubes Salt Beginner's Guide - #42 by Insurgo)

Note: Following BOOTSTRAPPING.md is required to deploy dom0 requirements in proper order. The order of installing dependencies is not that clear for salt beginners and things fail if dom0 requirements aren’t filled (will try to open issues or PR soon enough there)

Once qubes-builderv2 is deployed (and confirmed working), I tried to follow guidelines I inferred from other builderv2 migrated from builderv1 to the point of having the following, which doesn’t build anything as of now. @marmarek @fepitre if you have any insights, that would be awesome.

qubes-builderv2’s builder.conf:

git:
  baseurl: https://github.com
  prefix: QubesOS/qubes-
  branch: release4.2
  maintainers:
    # marmarek
    - '0064428F455451B3EBE78A7F063938BA42CFA724'
    # simon
    - '274E12AB03F2FE293765FC06DA0434BC706E1FCF'

backend-vmm: xen
debug: true
verbose: true
qubes-release: r4.2

distributions:
  - host-fc37

artifacts-dir: /home/user/src/qubes-builderv2/artifacts

components:
  - builder-rpm:
      branch: main
      packages: False
  - bees:
      branch: main
      packages: False
      url: https://github.com/tlaurion/qubes-bees
      maintainers:
        - '0ACCB2B664EE17E054B05E0B4A38DA8BEB9C8396'

executor:
  type: qubes
  options:
    dispvm: "dom0"

stages:
  - fetch
  - prep
  - build
  - post
  - verify
  - sign:
      executor:
        type: local
  - publish:
      executor:
        type: local
  - upload:
      executor:
        type: local

I deployed my public key (which signed commits and tags: keys.openpgp.org) under qubesbuilder/plugins/fetch/keys/ which is recognized properly. It fetches, verifies tag and puts content of git repo correctly under bees directory, but it seems i’m missing something under .qubesbuilder (Makefile.builder not required anymore, I inferred where only that file is required + bees.spec) since bees tarball is not put in place as can be seen later on.

The builder only does the following (putting git repo content at the right place but not telling anything else):

[user@qubes-builder ~/src/qubes-builderv2(main)]
(130)$ ./qb --verbose -c bees package all
Running stage: fetch
13:27:23,198 [fetch] bees: source already fetched. Updating.
13:27:53,774 [executor:qubes:disp8379] copy-in (cmd): /usr/lib/qubes/qrexec-client-vm -- disp8379 qubesbuilder.FileCopyIn+-2Fbuilder-2Fbees /usr/lib/qubes/qfile-agent /home/user/src/qubes-builderv2/artifacts/sources/bees
13:27:55,040 [executor:qubes:disp8379] copy-in (cmd): /usr/lib/qubes/qrexec-client-vm -- disp8379 qubesbuilder.FileCopyIn+-2Fbuilder-2Fplugins-2Ffetch /usr/lib/qubes/qfile-agent /home/user/src/qubes-builderv2/qubesbuilder/plugins/fetch
13:28:00,838 [executor:qubes:disp8379] Executing '/usr/bin/qvm-run-vm -- disp8379 env -- VERBOSE=1 DEBUG=1 BACKEND_VMM=xen bash -c 'cd /builder && /builder/plugins/fetch/scripts/get-and-verify-source.py https://github.com/tlaurion/qubes-bees /builder/bees /builder/keyring /builder/plugins/fetch/keys --git-branch main --minimum-distinct-maintainers 1 --maintainer 0ACCB2B664EE17E054B05E0B4A38DA8BEB9C8396''.
13:28:02,241 [executor:qubes:disp8379] output: --> Verifying tags...
13:28:02,241 [executor:qubes:disp8379] output: ---> Good tag 596a650b16a908562e210638a03eb90fec7a759c.
13:28:02,241 [executor:qubes:disp8379] output: Enough distinct tag signatures. Found 1, mandatory minimum is 1.
13:28:02,241 [executor:qubes:disp8379] output: --> Merging...
13:28:02,266 [executor:qubes:disp8379] copy-out (cmd): /usr/lib/qubes/qrexec-client-vm disp8379 qubesbuilder.FileCopyOut+-2Fbuilder-2Fbees /usr/bin/qfile-unpacker 1000 /home/user/src/qubes-builderv2/artifacts/sources
Running stage: prep
Running stage: build
Running stage: post
Running stage: verify
Running stage: sign
Running stage: publish
Running stage: upload
13:28:12,339 [upload] host-fedora-37.x86_64: No remote location defined. Skipping.
[user@qubes-builder ~/src/qubes-builderv2(main)]

What am I missing under .qubesbuilder? It seems that the bees.spec is not considered, nor the downloading of prerequisites also specified into .quebesbuilder file above. Only content of the git repo is properly deployed, as can be seen here:

(130)$ ls -al /home/user/src/qubes-builderv2/artifacts/sources/bees/
total 72K
drwxr-xr-x 4 user user 4.0K Feb  9 13:05 ./
drwxr-xr-x 8 user user 4.0K Feb  9 13:28 ../
drwxr-xr-x 8 user user 4.0K Feb  9 13:28 .git/
drwxr-xr-x 2 user user 4.0K Feb  9 10:01 rpm_spec/
-rw-r--r-- 1 user user 4.2K Feb  9 10:01 bees.changes
-rw-r--r-- 1 user user  35K Feb  9 10:01 LICENSE
-rw-r--r-- 1 user user  250 Feb  9 13:05 .qubesbuilder
-rw-r--r-- 1 user user  102 Feb  9 10:01 README.md
-rw-r--r-- 1 user user   65 Feb  9 10:01 v0.10.tar.gz.sha256
[user@qubes-builder ~/src/qubes-builderv2(main)]

At the time of writing, here is the content of my .qubesbuilder under git repo:

$ cat /home/user/src/qubes-builderv2/artifacts/sources/bees/.qubesbuilder
host:
  rpm:
    build:
    - rpm_spec/bees.spec
source:
  modules:
  - gcc-c++
  - libbtrfs-devel
  - libuuid-devel
  - make
  - autotools
  files:
  - url: https://github.com/Zygo/bees/archive/refs/tags/v0.10.tar.gz
    sha256: v0.10.tar.gz.sha256

The repo (changing since not working) is at GitHub - tlaurion/qubes-bees: Best effort BRTFS offline deduplication, based on OpenSuse Tumbleweed rpm spec inclusion
Any advice welcome: i’m not understanding what i’m missing from linux example at GitHub - QubesOS/qubes-builderv2: Next generation of Qubes OS builder

Insurgo · February 9, 2024, 9:59pm

Was told over matrix that @VERSION@ placeholders under bees.spec.in file needed to be placed and version file dropped under root of repo, which I did.

user@heads-tests-deb12-nix:~/qubes-bees$ ls -al *
-rw-r--r-- 1 user user   4239 Feb  8 16:44 bees.changes
-rw-r--r-- 1 user user  35149 Feb  8 16:34 LICENSE
-rw-r--r-- 1 user user    102 Feb  8 16:34 README.md
-rw-r--r-- 1 user user 177509 Feb  9 10:15 v0.10.tar.gz
-rw-r--r-- 1 user user     65 Feb  9 01:30 v0.10.tar.gz.sha256
-rw-r--r-- 1 user user      5 Feb  9 15:13 version

rpm_spec:
total 12
drwxr-xr-x 2 user user 4096 Feb  9 16:56 .
drwxr-xr-x 4 user user 4096 Feb  9 16:56 ..
-rw-r--r-- 1 user user 2390 Feb  9 15:42 bees.spec.in

user@heads-tests-deb12-nix:~/qubes-bees$ cat rpm_spec/bees.spec.in 
#
# spec file for package bees
#
# Copyright (c) 2023 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.

# Please submit bugfixes or comments via https://bugs.opensuse.org/
#


Name:           qubes-bees
Version:	@VERSION@
Release:	1%{?dist}

Summary:        Best-Effort Extent-Same, a btrfs deduplication agent
License:        GPL-3.0-only
Group:          System/Filesystems
URL:            https://github.com/Zygo/bees
Source:         https://github.com/Zygo/bees/archive/refs/tags/v%{version}.tar.gz
BuildRequires:  gcc-c++
BuildRequires:  libbtrfs-devel
BuildRequires:  libuuid-devel
BuildRequires:  make

%description
bees is a block-oriented userspace deduplication agent designed for large btrfs
filesystems. It is an offline dedupe combined with an incremental data scan
capability to minimize time data spends on disk from write to dedupe.

Hilights:

* Space-efficient hash table and matching algorithms - can use as little as 1
  GB hash table per 10 TB unique data (0.1GB/TB)
* Daemon incrementally dedupes new data using btrfs tree search
* Works with btrfs compression - dedupe any combination of compressed and uncompressed files
* Persistent hash table for rapid restart after shutdown
* Whole-filesystem dedupe - including snapshots
* Constant hash table size - no increased RAM usage if data set becomes larger
* Works on live data - no scheduled downtime required
* Automatic self-throttling based on system load

%prep
%autosetup -p1

%build
cat >localconf <<-EOF
	SYSTEMD_SYSTEM_UNIT_DIR=%{_unitdir}
	LIBEXEC_PREFIX=%{_bindir}
	LIB_PREFIX=%{_libdir}
	PREFIX=%{_prefix}
	LIBDIR=%{_libdir}
	DEFAULT_MAKE_TARGET=all
EOF

%make_build BEES_VERSION=%{version}

%install
%make_install

%files
%license COPYING
%doc README.md
%{_bindir}/bees
%{_sbindir}/beesd
%{_unitdir}/beesd@.service
%dir %{_sysconfdir}/bees
%{_sysconfdir}/bees/beesd.conf.sample

%changelog


user@heads-tests-deb12-nix:~/qubes-bees$ cat .qubesbuilder
host:
  rpm:
    build:
    - rpm_spec/bees.spec
source:
  modules:
  - gcc-c++
  - libbtrfs-devel
  - libuuid-devel
  - make
  - autotools
  files:
  - url: https://github.com/Zygo/bees/archive/refs/tags/v0.10.tar.gz
    sha256: v0.10.tar.gz.sha256

fepitre · February 10, 2024, 11:41am

This is why nothing is happening. This option is telling qubes-builderv2 that there is nothing to build here. This mostly for builder-* components that are used as plugin.

fepitre · February 10, 2024, 11:44am

Also, I don’t see any git submodule so this is failing. According to the README: Declare submodules to be included inside source preparation

Insurgo · February 13, 2024, 7:01pm

I made some progress but still not there yet.

Reminder: my attempts (will get commit log clean when successful) is at GitHub - tlaurion/qubes-bees: Best effort BRTFS offline deduplication, based on OpenSuse Tumbleweed rpm spec inclusion

To replicate:

Then have the following builder.yml:

(1)$ cat builder.yml
git:
  baseurl: https://github.com
  prefix: QubesOS/qubes-
  branch: release4.2
  maintainers:
    # marmarek
    - '0064428F455451B3EBE78A7F063938BA42CFA724'
    # simon
    - '274E12AB03F2FE293765FC06DA0434BC706E1FCF'

backend-vmm: xen
debug: true
verbose: true
qubes-release: r4.2

distributions:
  - host-fc37

artifacts-dir: /home/user/src/qubes-builderv2/artifacts

components:
  - builder-rpm:
      branch: main
      packages: False
  - bees:
      branch: main
      url: https://github.com/tlaurion/qubes-bees
      maintainers:
        - '0ACCB2B664EE17E054B05E0B4A38DA8BEB9C8396'

executor:
  type: qubes
  options:
    dispvm: "dom0"

stages:
  - fetch
  - prep
  - build
  - post
  - verify
  - sign:
      executor:
        type: local
  - publish:
      executor:
        type: local

user@heads-tests-deb12-nix:~/qubes-bees$ cat .qubesbuilder
host:
  rpm:
    build:
    - rpm_spec/bees.spec
source:
  create-archive: true
  files:
  - url: https://github.com/Zygo/bees/archive/refs/tags/v0.10.tar.gz
    sha256: v0.10.tar.gz.sha256

user@heads-tests-deb12-nix:~/qubes-bees$ cat rpm_spec/bees.spec.in 
#
# spec file for package bees
#
# Copyright (c) 2023 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.

# Please submit bugfixes or comments via https://bugs.opensuse.org/
#


Name:           bees
Version:	@VERSION@
Release:        1%{?dist}

Summary:        Best-Effort Extent-Same, a btrfs deduplication agent
License:        GPL-3.0-only
Group:          System/Filesystems
URL:            https://github.com/Zygo/bees
Source:         https://github.com/Zygo/bees/archive/refs/tags/v%{version}.tar.gz
BuildRequires:  gcc-c++
BuildRequires:  util-linux-core
BuildRequires:  systemd-devel
BuildRequires:  make
Requires:       btrfs-progs
Requires:       systemd
Requires:       util-linux

# This removes errors from the build process, but it's not a good practice (fc37 related)
%define CXXFLAGS %{?CXXFLAGS} -Wno-error=restrict

%description
bees is a block-oriented userspace deduplication agent designed for large btrfs
filesystems. It is an offline dedupe combined with an incremental data scan
capability to minimize time data spends on disk from write to dedupe.

%prep
%autosetup

%build
cat >localconf <<-EOF
	SYSTEMD_SYSTEM_UNIT_DIR=%{_unitdir}
	LIBEXEC_PREFIX=%{_bindir}
	LIB_PREFIX=%{_libdir}
	PREFIX=%{_prefix}
	LIBDIR=%{_libdir}
	DEFAULT_MAKE_TARGET=all
EOF

%make_build BEES_VERSION=%{version}

%install
%make_install

%files
%license COPYING
%doc README.md
%{_bindir}/bees
%{_sbindir}/beesd
%{_unitdir}/beesd@.service
%dir %{_sysconfdir}/bees
%{_sysconfdir}/bees/beesd.conf.sample

%changelog
* Mon Feb 12 2024 Thierry Laurion <insurgo@riseup.net> - 0.10-1
- Initial package for fedora 37

I do not find a practical way to get -Wno-error=restrict passed down, which results in the following:

[user@qubes-builder ~/src/qubes-builderv2(main)]
(130)$ ./qb --verbose -c bees package all
[…]
13:43:22,970 [executor:qubes:disp4657] output: DEBUG: g++ -Wall -Wextra -Werror -O3 -I…/include -D_FILE_OFFSET_BITS=64 -std=c++11 -Wold-style-cast -Wno-missing-field-initializers -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -o ntoa.o -c ntoa.cc
13:43:22,970 [executor:qubes:disp4657] output: DEBUG: make[1]: Leaving directory ‘/builddir/build/BUILD/bees-0.10/lib’
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: In file included from /usr/include/c++/12/string:40,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: from …/include/crucible/ntoa.h:4,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: from ntoa.cc:1:
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: In function ‘std::char_traits::copy(char*, char const*, unsigned long)’,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: inlined from ‘std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_S_copy(char*, char const*, unsigned long)’ at /usr/include/c++/12/bits/basic_string.h:431:21,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: inlined from ‘std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_S_copy(char*, char const*, unsigned long)’ at /usr/include/c++/12/bits/basic_string.h:426:7,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: inlined from ‘std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_replace(unsigned long, unsigned long, char const*, unsigned long)’ at /usr/include/c++/12/bits/basic_string.tcc:532:22,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: inlined from ‘std::__cxx11::basic_string<char, std::char_traits, std::allocator >::assign(char const*)’ at /usr/include/c++/12/bits/basic_string.h:1655:19,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: inlined from ‘std::__cxx11::basic_string<char, std::char_traits, std::allocator >::operator=(char const*)’ at /usr/include/c++/12/bits/basic_string.h:823:28,
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: inlined from ‘crucible::bits_ntoa[abi:cxx11](unsigned long long, crucible::bits_ntoa_table const*)’ at ntoa.cc:31:10:
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: /usr/include/c++/12/bits/char_traits.h:435:56: error: ‘memcpy’ accessing 9223372036854775810 or more bytes at offsets [2, 9223372036854775807] and 1 may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict]
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: 435 | return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n));
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: | ^
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: cc1plus: all warnings being treated as errors
13:43:22,971 [executor:qubes:disp4657] output: DEBUG: make[1]: *** [Makefile:39: ntoa.o] Error 1
13:43:22,972 [executor:qubes:disp4657] output: DEBUG: make[1]: *** Waiting for unfinished jobs…

Full log:
qubes-bees_builerror.log (348.1 KB)
@fepitre @marmarek further advices?

@moderators : where should this post be moved to get more eyes? General?

Insurgo · February 16, 2024, 9:12pm

Still no luck. Any help welcome

Insurgo · April 1, 2024, 9:57pm

bees would permit to reduce cost of cloning AND backup restoration, which otherwise, with usage of unman’s shaker/ben-grande’s qusal, explodes on clones space costs and make restoration barely possible unless users have 2TB disks (which should not be a requirement).

Examples speak louder than words (qusal mostly all deployed here):

[user@dom0 ~]
$ qvm-ls --fields NAME,STATE,CLASS,TEMPLATE,LABEL,DISK,PRIV-CURR,PRIV-MAX,PRIV-USED,ROOT-CURR,ROOT-MAX,ROOT-USED | grep -ie NAME -ie tpl -ie minimal
NAME                         STATE    CLASS       TEMPLATE                    LABEL   DISK   PRIV-CURR  PRIV-MAX  PRIV-USED  ROOT-CURR  ROOT-MAX  ROOT-USED
debian-12-minimal            Halted   TemplateVM  -                           black   1549   109        2048      5%         1439       20480     7%
dev                          Halted   AppVM       tpl-dev                     purple  107    107        2048      5%         0          20480     0%
disp-mgmt-debian-12-minimal  Halted   DispVM      dvm-mgmt                    black   0      0          2048      0%         0          20480     0%
dvm-browser                  Halted   AppVM       tpl-browser                 red     112    112        2048      5%         0          20480     0%
dvm-debian-minimal           Halted   AppVM       debian-12-minimal           red     105    105        2048      5%         0          20480     0%
dvm-dev                      Halted   AppVM       tpl-dev                     red     107    107        2048      5%         0          20480     0%
dvm-fedora-minimal           Halted   AppVM       fedora-39-minimal           red     82     82         2048      4%         0          20480     0%
dvm-fetcher                  Halted   AppVM       tpl-fetcher                 red     256    256        15360     1%         0          20480     0%
dvm-media                    Halted   AppVM       tpl-media                   yellow  81     81         2048      3%         0          20480     0%
dvm-mgmt                     Halted   AppVM       tpl-mgmt                    black   75     75         2048      3%         0          20480     0%
dvm-qubes-builder            Halted   AppVM       tpl-qubes-builder           red     648    648        30720     2%         0          20480     0%
dvm-reader                   Halted   AppVM       tpl-reader                  red     105    105        2048      5%         0          20480     0%
dvm-sys-audio                Halted   AppVM       tpl-sys-audio               red     106    106        2048      5%         0          20480     0%
fedora-39-minimal            Halted   TemplateVM  -                           black   3645   113        2048      5%         3532       20480     17%
media                        Halted   AppVM       debian-12-minimal           yellow  491    491        51200     0%         0          20480     0%
qubes-builder                Halted   AppVM       tpl-qubes-builder           gray    675    675        30720     2%         0          20480     0%
qubes-builder1               Halted   AppVM       tpl-qubes-builder           gray    215    215        30720     0%         0          20480     0%
sys-cacher                   Halted   AppVM       tpl-sys-cacher              gray    587    587        20480     2%         0          20480     0%
sys-cacher-browser           Halted   AppVM       tpl-browser                 gray    117    117        2048      5%         0          20480     0%
sys-git                      Halted   AppVM       tpl-sys-git                 gray    477    477        20480     2%         0          20480     0%
sys-pgp                      Halted   AppVM       tpl-sys-pgp                 gray    106    106        2048      5%         0          20480     0%
sys-syncthing                Halted   AppVM       tpl-sys-syncthing           yellow  21749  21749      51200     42%        0          20480     0%
sys-syncthing-browser        Halted   AppVM       tpl-browser                 yellow  132    132        2048      6%         0          20480     0%
tpl-browser                  Halted   TemplateVM  -                           black   2216   111        2048      5%         2105       20480     10%
tpl-dev                      Halted   TemplateVM  -                           black   2551   109        2048      5%         2441       20480     11%
tpl-fetcher                  Halted   TemplateVM  -                           black   2037   109        2048      5%         1927       20480     9%
tpl-media                    Halted   TemplateVM  -                           black   2697   111        2048      5%         2586       20480     12%
tpl-mgmt                     Halted   TemplateVM  -                           black   3695   111        2048      5%         3584       20480     17%
tpl-qubes-builder            Halted   TemplateVM  -                           black   4767   112        2048      5%         4655       20480     22%
tpl-reader                   Halted   TemplateVM  -                           black   2583   109        2048      5%         2473       20480     12%
tpl-sys-audio                Halted   TemplateVM  -                           black   2023   111        2048      5%         1912       20480     9%
tpl-sys-cacher               Halted   TemplateVM  -                           black   1466   108        2048      5%         1357       20480     6%
tpl-sys-git                  Halted   TemplateVM  -                           black   1961   109        2048      5%         1851       20480     9%
tpl-sys-pgp                  Halted   TemplateVM  -                           black   1890   111        2048      5%         1779       20480     8%
tpl-sys-syncthing            Halted   TemplateVM  -                           black   1668   109        2048      5%         1558       20480     7%

bees would dedup all of those to barely no additional costs for clones, where OpenZFS would not require a daemon like bees. I’m holding dear to this idea since BRTFS is way more accessible (proposed at install) where OpenZFS might be nice for geeks and goes against my goal of accessible security (ideal would be to have this possible to be applied on first boot, with qusal/shaker eventually merged under QubesOS repositories if not installed by default).

@moderators : Where should this post be moved so that qubes-builder-v2 knowledgeable users (rpm spec file + help release this PoC) would be something that can go forward?

Thanks for your time!

Insurgo · April 14, 2024, 11:37pm

Big step forward today.

I decided to let go of the ideal of building against qubes-builder v2 directly, and decided to attempt to build bees against fedora-37 to match dom0. That was a success and a rpm now exists which is available in issue referred below.

So I have installed Q4.2.1 over brtfs custom installation option.
39Gb of templates+vm as a start, on which dedup is happening (slow over x230).

Intuition is that if bees daemon was started before files were installed, bees would turn from being an offline deduplicator into a live deduplicator and could prevent templates and OS installer to be deduped from the beginning. Sploiler: this is my personal goal of using BRTFS: being able at really low cost of cloning minimal templates (qusal/shaker) infinitely, and being able to use wyng-backup to have incremental backups (minimal backup size) that even when restored, would cost the bare minimal storage costs.

We will see what is the result of running bees after installation gains first, and if interesting, will try to push things a bit faster on my own time to have bees built on top of qubes-builder v2.

I also added PoC config helper generators under GitHub - tlaurion/qubes-bees: Best effort BRTFS offline deduplication, based on OpenSuse Tumbleweed rpm spec inclusion so that things get easier for anyone trying to experiment on this.

RPM available for download under upstream discussion:

github.com/QubesOS/qubes-issues

Switch default pool from LVM to BTRFS-Reflink

opened 09:46AM - 22 Mar 21 UTC

DemiMarie

T: enhancement P: default C: storage

**The problem you're addressing (if any)** In R4.0, the default install uses …LVM thin pools. However, LVM appears to be optimized for servers, which results in several shortcomings: - Space exhaustion is handled poorly, requiring manual recovery. This recovery may sometimes fail. - It is not possible to shrink a thin pool. - Thin pools slow down system startup and shutdown. Additionally, LVM thin pools do not support checksums. This can be achieved via dm-integrity, but that does not support TRIM. **Describe the solution you'd like** I propose that R4.3 use BTRFS+reflinks by default. This is a proposal ― it is by no means finalized. **Where is the value to a user, and who might that user be?** BTRFS has checksums by default, and has full support for TRIM. It is also possible to shrink a BTRFS pool without a full backup+restore. BTRFS does not slow down system startup and shutdown, and does not corrupt data if metadata space is exhausted. When combined with LUKS, BTRFS checksumming provides authentication: it is not possible to tamper with the on-disk data (except by rolling back to a previous version) without invalidating the checksum. Therefore, this is a first step towards untrusted storage domains. Furthermore, BTRFS is the default in Fedora 33 and openSUSE. Finally, with BTRFS, VM images are just ordinary disk files, and the storage pool the same as the dom0 filesystem. This means that issues like #6297 are impossible. **Describe alternatives you've considered** None that are currently practical. bcachefs and ZFS are long-term potential alternatives, but the latter would need to be distributed as source and the former is not production-ready yet. **Additional context** I have had to recover manually from LVM thin pool problems (failure to activate, IIRC) on more than one occasion. Additionally, the only supported interface to LVM is the CLI, which is rather clumsy. The LVM pool requires nearly twice the amount of code as the BTRFS pool, for example. **Relevant [documentation](https://www.qubes-os.org/doc/) you've consulted** `man lvm` **Related, [non-duplicate](https://www.qubes-os.org/doc/reporting-bugs/#new-issues-should-not-be-duplicates-of-existing-issues) issues** #5053 #6297 #6184 #3244 (really a kernel bug) #5826 #3230 ― since reflink files are ordinary disk files we could just rename them without needing a copy #3964 everything in https://github.com/QubesOS/qubes-issues/search?q=lvm+thin+pool&state=open&type=issues Most recent benchmarks: https://github.com/QubesOS/qubes-issues/issues/6476#issuecomment-1689640103

Again, please don’t run this on production machines. If you have a spare machine to test and make this go faster, please do. But this is not production ready yet, this is just experimentations to complement Switch default pool from LVM to BTRFS-Reflink · Issue #6476 · QubesOS/qubes-issues · GitHub which is stalling.

From my opinion, this is where the discussion is as of today: we need to retest BRTFS against TLVM setup and get real numbers from recent kernel versions to have a real picture of the current situation: Switch default pool from LVM to BTRFS-Reflink · Issue #6476 · QubesOS/qubes-issues · GitHub

TLDR: past openqa tests kinda said that TLVM was faster then BRTFS. This is different in my daily experience and seems to be the same, consensual “feeling” from qubes forum users, where openqa tests showed different results.

Insurgo · April 15, 2024, 6:56pm

Houla.

bees doesn’t support config file based option parsing, bees accept arguments that beesd doesn’t… On first impression, generating correct configuration files and then generating proper, dynamic, systemd bees calls to limit load average and other things… is a thing on its own.

I will continue to test this, but looking at the state of bees and readiness for being used in downstream projects seems to require a lot of downstream gluing/plumbing I was not expecting to be unfixed upstream and ready to use.

Commented at RFE: Configuration files · Issue #54 · Zygo/bees · GitHub

Seems like I am going to lower prioritize this and my interest to openzfs just increased once more

Discoveries:

QubesOS on brtfs compresses. beestats.txt is updated once an hour as documented in issue not in doc at besstats.txt not updating · Issue #178 · Zygo/bees · GitHub
- That stats informs us of bees hash table occupancy, compressed/uncompressed fs ratio, pagesize distribution so that one can tweak bees hash table… After the fact unless we increase dom0 reserved memory which 4gb is already a lot for systems having 16gb only. This is a problem. We need dynamic config.
- bees/docs/config.md at 28ee2ae1a88c811e2e5faae6b40ef63a48324a5d · Zygo/bees · GitHub gives insights on how to calculate hash table size for general purposes OSes, which don’t apply to QubesOS with snapshots, rotations, compression and raw images directly. Tweaking is necessary at time of configuration file generation based on the btrfs partition size and some heuristics to determine expected data uniqueness. QubesOS use case here means a lot of data redundancy through snapshots and cloning (which is why we are interested in bees) while setting the right thing first is not so straightforward. See my script to see where I am into that process.

Some stats

I installed Q4.2.1 over brtfs, not choosing Fedora templates this time because of other testing I intend to do in the purpose of this PoC. That is, deploying qusal, which clones and specialize minimal templates and deploy sys-cache to download updates once and install from local cache against multiple specialized clones. That’s it, got bored of fedora once and for all not being compatible with apt-cache and decided to never look back unless I really have to. Those issues are simply never fixed upstream really and workaround under cacher/qusal are always to be updated because checksums failing and templates failing even update chekcs therefore no available package updates are making their way into dom0 widget.
- So there is not so much expected gains on such not-cloned deployement as a start: the 39gb deployed templates got reduced to 37gb, but it took multiple hours for bees to parse all that data BEFORE being able to dedup
  - bees talks about optimizations that are possible if btrfs btrfs subvolumes are active. This is not the case right now under QubesOS. We only have one subvolume which is the pool, therefore --scan-mode 0, which would help prioritize clone/snapshot rotation for dedup, cannot be used. See bees/docs/config.md at 28ee2ae1a88c811e2e5faae6b40ef63a48324a5d · Zygo/bees · GitHub
  - It took like 8 hours to parse 39Gb of rootfs with an x230 and a fast ssd drive (everything is under a single dom0 btrfs under brtfs as of now) to gain around 2gb of deduplication. Of course, more gains are to be expected after cloning happens.

Lessons learned

if bees was deployed at OS install prior of template deployment, the gains could theoritically be nearly instantenous, but that needs to be proven.
- depending on what the end user decides, the gains would still be minimal at install with either unfinished deduplication before end of install (not a problem) but not sure how we could justify such high CPU usage and extended installation time for such low instant gains. Some of debian/whonix overlaps, where whonix workststion over whonix gateway explains the gain observed here: 2gb. My stats gathering needs some more fu.

Impressions

All in all, i’m a bit disappointed by current UX of bees. I really thought last time I checked it (theoritically) that configuration files were supported (not just passing some options through config file while still needing to craft runtime tweaks arguments)
I already invested a lot of hours trying to generate configuration files that would permit to have proper baseline configurations generated, having understood, wrongly, that beesd needed to pass UUID of the filesystem to find corresponding config file and there, the configuration options, to realize that for what it seems, only the hash table size and directories can be configured there while the rest still needs to be on the command line passed to beesd
Continuing experiemtnations there made me realize as well that even some options that I was expecting to be able to pass to beesd to be passed to bees are not parsed from beesd… Basically, that would be a big collaboration upstream to arrive to not so much improvements unless QubesOS also integrates it at installation medium and changes as well subvolumes configurations.
- Why should bees parse dvm ephemeral disks while they could be in different subvolume not cared for by bees. In other words: why deduplicated something that is gonna be discarded anyway
- Why revert snapshots be cared for by bees dedup with same reasoning as above.
- Why bees would want to watch for dedup of the whole dom0 filesystem outside of the directory root related to appvms/templates disk states.
- All of which requiring way more changes…

Then if we switched to OpenZFS, which would not apply dedup AFTER (offline) like bees does, but would prevent blocks from being written to the fs at the first place.

Comparison from TLVM/BTRFS/ZFS should be done on that.

But as of now, my interest into bees just lowered… a lot.
But again, ZFS doing that live dedup will cost more then BTRFS, way more and exponentially depending of the disk size. Once again, users are not expected to consider their dom0 ram size reservation if they choose to have a big SSD drive nowadays, but going BTRFS/OpenZFS would chanfge that.

TLDR: as of now, bees configured to have around 150Mb of hash table permitted to properly dedup just the base OS installation, reducing 39GB->37GB, and those stats are not good enough to compare.

Next:
I would need to extract proper size reductions to give good comparative just in whonix templates reduction to be convincing here without going further and deploy qusal while beesd is stopped. And then compare after offline deduplication occured.

Facts repetition:
Here again lets remind the fact. Bees is offline dedup tool. Meaning that the duplicated data needs to be written first to disk to be deduped. This means a lot of unecessary IO and writes happening on the drive for nothing, being then written again to tell the space is free, to be rewritten again and again, meaning more IO… which OpenZFS would prevent altogether at the cost of more ram used in dom0.

Losses:
Older hardware, from this experience, would not gain much but space at the cost of a lot of slow operations to get that gain.

All in all, I think older hardware benefit today of all those bonuses by putting a big ssd drive in older hardware and reinstall without caring much about size consomption.

The gains on performance between brtfs/tlvm+ext4 is a different subject, and might only be perceivable on older hardware.

The gains of space from dedup on newer hardware might vs CPU cost might as well be unperceivable.
Redoing this test on newer hardware, where ssd<->pci<->ram overhead might not be seen, might as well explain why TLVM vs BRTFS perf tests were not seen, where in this forum, big gains were observed on t430, which is old hardware with maximal ssd speed never really being reached because pci speed and ram speed being lower then the drives.

The question is always: what are we testing.

So that is that folks. Will context switch and come back to this to deploy qusal and rerun pre-post tests.

Meanwhile if somebody could share here would be the proper baseline commands to compare things properly in pre-test/post-test so I can report current stats on consumed disk space on fresh install vs this test laptop, that would help having a proper trace to be useful maybe later.

Insurgo · April 15, 2024, 8:31pm

Log:
bees.log (19.3 KB)

Extracts:

2024-04-15 16:24:08 13033.13061<6> hash_prefetch: Hash table page occupancy histogram (4699811/7667712 cells occupied, 61%)
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                                                                  8192 pages
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                                                                # 4096
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                             #####                              # 2048
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                            ########                            # 1024
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                           ##########                           # 512
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                          ############                          # 256
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                         ##############                         # 128
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                        ################                        # 64
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                       ##################                       # 32
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                       ##################                       # 16
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                      ####################                      # 8
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                      #####################                     # 4
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                      #####################                    ## 2
2024-04-15 16:24:08 13033.13061<6> hash_prefetch:                    #######################                   ### 1
2024-04-15 16:24:08 13033.13061<6> hash_prefetch: 0%      |      25%      |      50%      |      75%      |   100% page fill
2024-04-15 16:24:08 13033.13061<6> hash_prefetch: compressed 2839450 (60%)
2024-04-15 16:24:08 13033.13061<6> hash_prefetch: uncompressed 1860361 (39%) unaligned_eof 51693 (1%) toxic 1319 (0%)

Config output of config generator as per latest commit on qubes-bees:
configs.tar.gz (1.8 KB)

Extracts:

# Size of the hash table for deduplication (in bytes)
# The hash table size is calculated based on the disk size in GB and the recommended size for 1TB of unique data.
DB_SIZE=122683392

So with a default installation, we need to consume 122Mb of dom0 solely for dedup operations of unique data from the templates+dom0 files, with somehow a lot of cpu overhead for not so much observable gains. I guess this could be a tradeoff for some niche use case only at this point where things would need fixing upstream prior of going at it.

If at least this was live dedup as OpenZFS does, I would throw that dom0 ram to it without any concern. But after the fact, offline dedup makes me want to buy larger hard drives and not care about it.

Also. When bees die, we need to restart it manually… this is unfixed issue cancelling with CTRL+C need to be followed by umount /run/bees/mnt/$UUID · Issue #281 · Zygo/bees · GitHub I opened based on a hidden comment from another unrelated issue.

And then bees breaks existing reflinks? · Issue #270 · Zygo/bees · GitHub ?

Bees will break existing reflinks if it sees other duplicate chains of blocks matching the ones just found. But it will eventually clean up the unreachable extents after some time, just leave it running for long enough. This behavior is probably already described in the documents somewhere and it expected due to bees working very different from other deduplicators.
Also, in your situation, bees will probably add more metadata and thus also increase allocation somewhat.

Insurgo · April 16, 2024, 12:31am

Ok. So here is the plan.

I have two testing x230 available with same mx500 250 ssd drives, same configuration (non relevant but i7, 16gb ram).

install clean on both laptops q4.2.1, without fedora template (as stated, doens’t make much sense in the goal of deploying qusal since sys-cacher cannot deal properly with fedora because zck files failing checksum that hasn’t been yet figured out upstream into apt-cacher-ng for way too long: I make a statement, fedora sucks with its 100mb of cache needed to be downloaded prior of reporting updates to dom0 per template, not cached and debian-12 being far much superior with extrepo and packages availability for my use cases, being developing and maintening stuff without needing to change template every 3 months: this sucks. No more)
On one laptop, bees deduplication, on the other not. Check stats and try to figure it out. Provide best stats as can be to show dedup gains. Wait for feedback. Go forward and diverge of fresh q4.2.1 reproducible results for anyone.
1.install q4.2.1 on TLVM default install. Extract boot times stats between the two machines with only difference between BRTFS vs TLVM on same specs machine.
Then, no more clean install. One the laptop without bees, deploy qusal. Clone that drive (external cloner) back to the other mx500 250 drive. Redo dedup with bees overnight. Post dedup gains. Wait for feedback. Run tests asked from community. Iterate.

Comments, recommendations?