Ext4 vs. Btrfs performance on Qubes OS installs

rustybird · September 20, 2022, 9:58am

This is due to a difference in what the storage drivers (lvm_thin vs. file-reflink) consider to be a volume’s disk usage - which then leads to weird looking results when Qube Manager unconditionally sums up all volumes of a VM. But it’s “only” cosmetical.

If you mean the size prediction in the GUI backup tool’s VM selection screen, that’s a different cosmetical bug. It shouldn’t affect the actual backup size.

renehoj · September 20, 2022, 10:18am

Okay, I had all VMs added in the backup tool, and the total size got me a little nervous.

Thanks for the explanation.

tasket · October 15, 2022, 5:23am

Sorry to wade into this a bit late, but you’re quite right about the default LUKS sector size… seems sub-optimal.

However, Thin LVM chunk size will have a minimum size of 64KB, and is usually larger depending on the pool LV size at time of creation. My main system uses 64KB despite having a large pool size; I assume this enhances random write performance but haven’t tested it. #write_amplification

On the ‘cost’ of Thin LVM snapshots: Making snapshots is essentially no cost, but deleting (and oddly enough) renaming snapshots takes a significant amount of time. The latter are processed by the kernel in a single-threaded fashion and I usually see 80-100% CPU for >5s when Qubes or Wyng deletes a large snapshot.

Btrfs - My understanding is that it is extents-based but has a settable minimum sector size via mkfs.btrfs with a default of 4096. I think a good basis for comparison would have LUKS set to 4096, Btrfs at default 4096, and Thin LVM pool at 64KB.

Demi · October 15, 2022, 4:33pm

I agree. Also, be sure you are not using the deprecated file driver. That will have terrible performance no matter what, and is going away as it does not have feature parity with the others.

One possible reason that deleting snapshots is so expensive is that Qubes always does a blkdiscard before a lvremove. Thin pools do not handle discards well at all.

Insurgo · October 15, 2022, 6:18pm

@demi what is the state of the loop device PR merge so that benchmarking would make sense under Qubes at some point?

Demi · October 15, 2022, 6:19pm

Merged already, will be in the next vmm-xen release.

Insurgo · October 15, 2022, 6:41pm

@Demi Would be helpful to link with pr and qubes-testing url if goal is to have those fixes known and tested under the testing section of the website…

Otherwise who tests what, really?

Insurgo · October 15, 2022, 7:04pm

@Demi don’t get me wrong on the tone here, but there were a lot of regressions on 4.1 as opposed to 4.0 stability experience.

My point here is that :

Is not enough. I’m following GitHub - QubesOS/updates-status: Track packages in testing repository as close as I can. And I see no vmm-xen to be tested, nor fixes for suspend/resume to be tested, with PR getting way too long to land even in unstable repo. I would expect things to be way more verbose under the testing section of this forum, and my guess is that there is a lot of confusion from even the willing testers to test something to be tested and if those things to be tested even reach willing testers.

How can we improve that should be discussed under the testing section, not here, but this subject will be a good quotation to justify testing discussions, which is why i’m writing it here. No blame or whatever here, but I see a lot of space for improvements through better communication and appropriate pointers.

Insurgo · October 16, 2022, 12:47am

@demi: I see that vmm-xen has PR has been approved.
Created a new post under What to test? Where to get what to test? Where to report testing results? - #3 by Insurgo so that this important package uodate is properly tested by the testing community.

Please lets continue “testing” discussions process over there.

Demi · October 20, 2022, 6:25pm

Can you run tests in these configurations?

LVM thin provisioning + XFS, with the lvm_thin storage driver.
LVM thick provisioning + XFS, with the reflink storage driver and --direct-io=on passed to losetup in /etc/xen/scripts/block.
LVM thick provisioning + XFS, with the reflink storage driver and --direct-io=off passed to losetup in /etc/xen/scripts/block.
BTRFS + Blake2b, with the reflink storage driver and --direct-io=on passed to losetup in /etc/xen/scripts/block.
BTRFS + Blake2b, with the reflink storage driver and --direct-io=off passed to losetup in /etc/xen/scripts/block.

If you are using a 4Kn disk, skip 2 and 4 as they won’t work (you won’t be able to boot any VMs).

rustybird · October 21, 2022, 2:23pm

~~* 4K dm-crypt~~ Sorry - @Demi was right, a 4Kn disk is generally problematic even without 4K dm-crypt.

Configuration 4 definitely works, at least in the sense that it doesn’t produce an error, and losetup -l shows the intended result of direct I/O + 512B sectors even though the underlying dm-crypt block device is 4K. I’ve been using this configuration (except with the default checksum function) for months.

It’s XFS (configuration 2) and ext4 that are not so flexible.

Sven · October 23, 2022, 4:06am

I think it’s explained by…

So it’s in line with @Insurgo’s observations.

tasket · February 27, 2023, 3:50pm

I decided to do a Btrfs install with the recommendations from this and the “SSD maximal performance” threads, and settled on the idea of formatting a two-device Btrfs fs with the options -O no-holes --csum xxhash for better efficiency. All on top of a 4K aligned LUKS partition, using GPT/gdisk.

The Btrfs part turned out to be a fools errand, as neither anaconda nor kickstart seem to support passing custom options to mkfs and anaconda/blivet insist on not installing into an existing fs.

So instead of doing a full dom0 root+everything Btrfs setup, I installed Qubes with a 25GB XFS partition and am now configuring custom Btrfs partitions to hold all domU stuff. If qvm-pool cooperates, I’ll be sitting pretty on my test system, also with Linux kernel 6.1 or 6.2 which have Btrfs optimizations that should greatly impact large-file access.

51lieal · February 27, 2023, 6:03pm

can you be specific? after manually configuring via tty do rescan drive, blivet will read it.

tasket · February 27, 2023, 8:26pm

Blivet said it wouldn’t accept my Btrfs-on-LUKS partition unless I let it re-format it. If I let it re-format, I lose no-holes and xxhash.

I’m not sure, but I think in some cases Blivet or Anaconda custom will also re-format LUKS in addition to the fs. It seemed like once my 4096b sector LUKS changed back to 512b. I think in that case when you click ‘Done’ it asks you for a passphrase even though you’ve already unlocked your existing LUKS.

As for the advantage of having root on the custom fs, I don’t think there is much. The good part is Qubes lets me set up a vm pool the way I want.

51lieal · February 28, 2023, 4:17am

in fc32 (4.1) shouldn’t be like that, also i dont know how you do that but if you want to try.

after encrypt drive and do mkfs.btrfs in shell with volume name is qubes_dom0
do refresh and rescan in anaconda, click blivet and done.

you should see your btrfs volume there, what you have to do is right click on that volume, click new,
name: root
mount point: /

tasket · February 28, 2023, 5:13am

I’ll take your word for it, as I was looking at the gears menu not right-clicking. For now I’ll be using the XFS root + custom Btrfs pool for domU related tests.

For testing vm startup times, I might prefer a different approach than starting the same vm 3X since that leans heavily on cache and doesn’t mirror usage patterns (after the system boots, at least).

I’m thinking have 3 different vms based on 3 different templates that are not de-duped or reflink-cloned. Each startup runs Firefox browser, then shuts down the vm.

qinix · September 1, 2023, 10:15pm

In the 4.2 RC2 installer is it possible to choose btrfs? Is this option available?

Sven · September 1, 2023, 10:21pm

Yup, same as it was in R4.0 and R4.1.

Insurgo · September 2, 2023, 6:05pm

This can be found under custom partitioning alongside LVM. The default is a single LUKS container encompassing SWAP and TLVM pools. (Thin LVM or fat LVM: that is: preallocated Logical Volumes)

Note that BRTFS partition scheme under QubesOS is still 2 LUKS containers, one for the pool and one for swap.