Ext4 vs. Btrfs performance on Qubes OS installs

So just to make things clear here.

If this is not on top of LVM, this would mean your comparison of performance is actually comparing apples and oranges. This is not simply comparing ext4 and brtfs, but different pools, different partition scheme, different sector size and as you said in other thread, a whole rabbit hole to dig in. Still interested into digging that, but as for you, I do not have either the actual deep knowledge necessary to have deeper insights than the ones I dropped in my initial thread for sector size/alignments, where PR was created but didn’t patched an iso to test it on distinct laptop to see what would happen if LUKS+LVMs would be aligned and if and only then performance would be faster (it definitely should, since it makes no sense we use 512bytes sector size nowadays, where BRTFS is reported to not do that because of different volume management path used by Qubes to create volumes then passed to qubes to be used, hence different performance.)

To dig that down, I think it should again happen under https://forum.qubes-os.org/t/ssd-maximal-performance-native-sector-size-partition-alignment not here.

But yes: I think this thread is comparing oranges and apples, unfortunately.

1 Like

You just confirmed there is no VG (virtual group on top of multiple PV (physical volumes) nor LV (Logical volumes to be used by qubes) ).

Hence, you are using BRTFS on top of LUKS, which explains why you have two LUKS partitions. This is interesting though. You are the first person I know using Heads to boot Qubes with BRTFS (hear me out here, you are simply kexec’ing into multiboot from Heads here, booting xen+kernel+initrd from unencrypted /boot, and passing decryption key unsealed from TPM to Qubes which uses it instead of prompting for LUKS passphrases (plural here because that would be otherwise the case, which is why you passed /dev/sda2 and /dev/sda3)

1 Like

One day I will have to read all the docs…

[user@dom0 ~]$ sudo qvm-pool
NAME          DRIVER
varlibqubes   file
linux-kernel  linux-kernel
vm-pool       lvm_thin

I don’t even know how to interact with that tool directly as of today, but from what I think I understood with XFS/ZFS/BRTFS threads I digged in, it seems that they fall into file/linux-kernel pool.

@brendanhoar @Rudd-O @rustybird : Could you please ELI5? Or jump a bit into https://forum.qubes-os.org/t/ssd-maximal-performance-native-sector-size-partition-alignment to correct the facts? Sorry If I tag you wrongly, but from what I read elsewhere, you guys seem to understand way better then us here what is going on and impact the performance of Qubes as of today. That would be hightly beneficial

The important post from the other thread is here: SSD maximal performance : native sector size, partition alignment - #30 by rustybird and says:

Alright, from your perspective that might be true. From my perspective I compared Qubes OS default vs. choosing Btrfs in the installer and might have chosen the thread title wrong.

I made this one change and got a substantial increase in performance. I’d like to understand and if possible have others enjoy the same benefit.

2 Likes

So @Sven :thinking:
This seems to mean that the BRTFS pool

BRTFS is a qvm-pool of varlibqubes type.

Agreed. This is important. But I would love to understand why thin-lvm pools are stuck to 512 sectors and have thin-lvm pools have maximum performance speeds (which LVM pools are the standard way, and for which wyng-backups is the only supported pool today).

As far as my understanding, LVM2 thin pool is beginning to have a bad press and people are starting to want to get away of it because of misconfiguration by default, where @Rudd-O pushes for ZFS, others push for XFS and here we push for BTRFS. I think if LVM pools and LUKS containers were configured in an optimized way, we might not all want to get out of it :slight_smile:

1 Like
[user@dom0 ~]$ sudo qvm-pool
NAME          DRIVER
varlibqubes   file-reflink
linux-kernel  linux-kernel

While I don’t have any deep understanding of this topic I do have two identical laptops. One is my daily driver and the other is ‘standby’. So if there are concrete things you want me to try (even if they are potentially destructive) I am happy to try them out on the ‘standby’ T430. I might learn something in the process.

Any recommendations on what to read to get a grip on LVM, LUKS, pools etc?

As of now, this is the whole thread at SSD maximal performance : native sector size, partition alignment - #30 by rustybird including changes made by @rustybird at https://github.com/QubesOS/qubes-linux-utils/pull/85

To make it really high level. And from my basic understanding as of now…
When installing the system, 3 modes are proposed.

LVM, creating fat filesystems where definite volume size is created. Those volumes are created per assumptions based on what the installer, and available tools, are able to get from the hardware.

Thin-LVM creates volumes without costs. This is really interesting because clones in Thin-LVM has no cost. So when you clone qubes, they have no cost until those volumes diverge. And there is no cost but the consumed space of those volumes (their content), where for clones, they refer to their original volumes and are qcow, so they diverge on writes on their thin-lvm themselves.

XFS/BRTFS/ZFS all have similar mechanisms, but since they are reflink, and files on the filesystem, the kernel drivers and pool implementation are the ones instructing how to deal with clones, and LVM mechanisms are not used there. Different implementations, different optimizations.

On file system creation.
For LUKS creation at install, if not hardcoded or properly detected (cryptsetup 2.4 if I recall well, not part of dom0 current fedora), the logical sector size is used, which is still 512 bytes instead of 4k. This is problematic for other tools which will reuse that assumption based on the block level of LUKS to creat the pools on LVM. Then, scripts are either reusing those logical sizes, or hardcoding sector size, depending of what types of volumes passed to the qubes. So rustybird patched volatile file creation so that qubes have the illusion of having a read+write root filesystem. But a problem persists to be able to replicate and tests optimized results. When installing templates at install, the root volume is not 4k. When creating service qubes and default appvms, private voumes are not created with 4k sectors. Some of those passed volumes into qubes (/dev/xvd*) require a partition table, which if misconfigured, will simply refuse to launch installed system.

This is where the discussion is stalled under https://forum.qubes-os.org/t/ssd-maximal-performance-native-sector-size-partition-alignment. @rustybird figured out where the problems lies. Proposed a fix for volatile volume creation and said it would be more complicated to fix private volume and root volume creation. Consequently, I do not know as of now what/how to patch a live iso at runtime (can invest time there but not now) to patch code used to private and root volume creation at install (phase 1 of installer) so that templates are decompressed on top of a correctly configured LUKS partition. But I do not know how to fix code for private volume creation, which happens through salt script against scripts and Xen block related code to actually create service app qubes and default qubes prior of booting into the system. Last time I checked, no qube were launching at boot.

That is the shortest version I can give on the state of that long thread over https://forum.qubes-os.org/t/ssd-maximal-performance-native-sector-size-partition-alignment

1 Like

Thank you @Insurgo, but you give me too much credit. I need to go and read about what those things are and what they do … not only in reference to our topic, but in general. :wink: Yes, I can use a search engine. Just asking if there is a particular introduction you found helpful.

Hmm. I am not sure where I would start.
Fedora explains why they switched to BRTFS Choose between Btrfs and LVM-ext4 - Fedora Magazine

2 Likes

Sorry i’ve been away for a week.

for your device, yes it’s possible to setup like that.

Actually using blake2b make the performance slower, the default using crc32c algorithm, you can use xxhash64 for best speed, but not known if your cpu support.

further details check here.

I have tested that using 4kn drive + 4kn template boost overall performance as I do benchmark about that in the thread @insurgo mention.

the problem you may faced if you use 4kn drive with official iso (512e template) :

  1. With LVM+XFS / EXT4 you wouldn’t be able to finish installation, you need to setup everything manually.
  2. BTRFS doesn’t have problem with it.

And if you do custom iso and build 4kn template, there’ll be no problem.

1 Like

Let us know what you find–I really hope 3 second VM startups become the norm someday

Do you have thin partitions when using btrfs?

I tried reinstalling with btrfs, and now I’m seeing much higher disk usage in qube manager and when doing backup.

When a qube is on, the disk usage seems to be the size of the template + the size of the appvm, and when it’s off the disk usage is just the size of the appvm.

This has increased the backup size by 300-400% when doing a full system backup.

Are you seeing the same numbers, or did I do something wrong?

This is due to a difference in what the storage drivers (lvm_thin vs. file-reflink) consider to be a volume’s disk usage - which then leads to weird looking results when Qube Manager unconditionally sums up all volumes of a VM. But it’s “only” cosmetical.

If you mean the size prediction in the GUI backup tool’s VM selection screen, that’s a different cosmetical bug. It shouldn’t affect the actual backup size.

1 Like

Okay, I had all VMs added in the backup tool, and the total size got me a little nervous.

Thanks for the explanation.

Sorry to wade into this a bit late, but you’re quite right about the default LUKS sector size… seems sub-optimal.

However, Thin LVM chunk size will have a minimum size of 64KB, and is usually larger depending on the pool LV size at time of creation. My main system uses 64KB despite having a large pool size; I assume this enhances random write performance but haven’t tested it. #write_amplification

On the ‘cost’ of Thin LVM snapshots: Making snapshots is essentially no cost, but deleting (and oddly enough) renaming snapshots takes a significant amount of time. The latter are processed by the kernel in a single-threaded fashion and I usually see 80-100% CPU for >5s when Qubes or Wyng deletes a large snapshot.

Btrfs - My understanding is that it is extents-based but has a settable minimum sector size via mkfs.btrfs with a default of 4096. I think a good basis for comparison would have LUKS set to 4096, Btrfs at default 4096, and Thin LVM pool at 64KB.

2 Likes

I agree. Also, be sure you are not using the deprecated file driver. That will have terrible performance no matter what, and is going away as it does not have feature parity with the others.

One possible reason that deleting snapshots is so expensive is that Qubes always does a blkdiscard before a lvremove. Thin pools do not handle discards well at all.

@demi what is the state of the loop device PR merge so that benchmarking would make sense under Qubes at some point?

Merged already, will be in the next vmm-xen release.

1 Like

@Demi Would be helpful to link with pr and qubes-testing url if goal is to have those fixes known and tested under the testing section of the website…

Otherwise who tests what, really?

@Demi don’t get me wrong on the tone here, but there were a lot of regressions on 4.1 as opposed to 4.0 stability experience.

My point here is that :

Is not enough. I’m following GitHub - QubesOS/updates-status: Track packages in testing repository as close as I can. And I see no vmm-xen to be tested, nor fixes for suspend/resume to be tested, with PR getting way too long to land even in unstable repo. I would expect things to be way more verbose under the testing section of this forum, and my guess is that there is a lot of confusion from even the willing testers to test something to be tested and if those things to be tested even reach willing testers.

How can we improve that should be discussed under the testing section, not here, but this subject will be a good quotation to justify testing discussions, which is why i’m writing it here. No blame or whatever here, but I see a lot of space for improvements through better communication and appropriate pointers.