BTRFS and Qubes OS

I recently switched from EXT4 to BTRFS and my subjective impression is a dramatic improvement in Qubes OS performance. However I am not 100% sure which of the following three factors is to credit or if it is a combination of all of these:

  1. it’s a new install with restored backup (not much fragmentation, snapshots etc. yet)
  2. I also exchanged the SSD from a QVO to a PRO (should explain some of the improvement)
  3. switch to BTRFS

There is an overall performance improvement, which I am ready to explain with 1) and 2), but there is a dramatic improvement when shutting down a large (100 GB) HVM and when starting a Qubes Backup from a 30+ seconds delay to literally nothing.

I am wondering if there are some folks around here, who know more about BTRFS and can explain a bit why it would improve these situations. My guess is that it has something to do with snapshots, lazy writes, etc. But I would love to hear from an expert.

2 Likes

Check this out [RFC] R4.1: switch default pool from LVM to BTRFS-Reflink · Issue #6476 · QubesOS/qubes-issues · GitHub

I’m curious too, and btw have you try xfs ? if yes, what’s your thought ?

1 Like

A post was merged into an existing topic: XFS vs BTRFS performance with Qubes

The slowdown is reported here and yes, using btrfs dramatically speeds up the shutdown for VMs with large storage, as reported by many users.

4 posts were split to a new topic: XFS vs BTRFS performance with Qubes

That’s probably because the Qubes backup system frontloads the export() of every volume object.

In other words, at the beginning the backup system queries every relevant volume (the private volume of every included AppVM, and the private+root volumes of every included TemplateVM/StandaloneVM) asking “gimme a file name from which to read the volume data”. The lvm_thin storage driver has to activate the volume first before it can be read from, by launching external lvchange -ay commands for each one. But the file-reflink storage driver (used by the Btrfs installation layout) can respond instantly because the data file exists in a readable state at all times.

2 Likes

That makes sense. Thank you! Do you have an idea why I don’t see the long slowdown after shutting down the large HVM? Things like qvm-ls would block for 20+ seconds in those situations.

HVM or not shouldn’t make a difference, but if the VM has a large volume then the previous delay was due to the issue @equbes linked to for sure. (Don’t miss the many issue comments in the middle that GitHub has helpfully :roll_eyes: collapsed, there’s some interesting analysis of dm-thin by iamahuman.)

1 Like

Also I switched to BTRFS about two weeks ago. I am seeing the same improvements @Sven reports.

Also it seems I was corrupting my LVM system doing hard shut downs when a 600gb VM I constantly have open would prevent the system from shutting down. Now because the system actually shut downs I suspect I won’t run into an unstable system so easily.

However this is one constant error I’m seeing during shut down in the terminal and that’s a [FAILED] Stopped (with error) /dev/dm-0, maybe @rustybird knows what is causing this?

I hope this doesn’t bring back my stability issues or a failed boot directory which happened often before.

Thanks.

Not sure if device-mapper numbers are deterministic - can you check $ lsblk -no mountpoint /dev/dm-0 in dom0 to see whether it prints / or [SWAP]?

I see the following:

[SWAP]

/
/boot

I hope I didn’t misconfig this :frowning:

I did try to change the swap file size, but it didn’t seem to take but maybe doing so messed something up. I really hope not D:

Edit: Tried changing swap file size using the gui installer. Haven’t touched anything since.

That looks like you ran

$ lsblk -no mountpoint

instead of the full command line

$ lsblk -no mountpoint /dev/dm-0

Oops yes you’re correct. I misread the full command.

doing

lsblk -no mountpoint /dev/dm-0

Now prints

/

Hmm one thing that might help is doing an extra step of shutting down all your VMs ($ qvm-shutdown --all --wait, then check e.g. in the domains widget that they’re really all gone, maybe even wait a few more seconds) before you shut down the system as a whole.

Normally that should be automatic, but I suspect there’s a bug in some layer of the shutdown procedure.

1 Like

This works to prevent the error. So I’ll just use this when shutting down until I gain the experience required to troubleshoot more and submit a correctly formatted bug report. For the interim, thank you!

I just installed 4.1 and intended to configure it with BTFS with my laptop/SSD but kept going round in circles.

When you accept the default partitioning you get LVM, but if I selected custom option and BTRFS it only gave an error saying it could not validate the disk configuration, and nowhere does it say why. I’m guessing that I needed to also set up the partitioning on my own but there is nothing that even gives a hint what sizes for what partition or even how many partitions I needed. I was just stuck in a loop until I gave up and went with the LVM default. I’ll have to redo it obviously but realized I needed to do some research, but before trying again I thought I would ask a question: Why do we not have an automatic option where you can just select between LVM and BTRFS as an option? Either that or the custom section should give a hint on what it needs to be successful.

You can try wipe out disk first then just select custom > change lvm thin to btrfs > create automatically.

I tried again and the trick is apparently is to manually delete the partitions before clicking the “Click here to create them automatically”. After doing that I was able to continue but got an exception from anaconda during the actual partitioning.

"Error: use the -f option to force overwrite of /dev/mapper/luks-{big hex number} "

Since I don’t have a running system yet I could not copy the entire stack trace off the system. The attached photo is the top of the trace. I’ll go back and try again to see if I can get past this error somehow.

My next attempt was successful installing 4.1. with btrfs. I still have no idea what went wrong with my previous attempts other than possibly user error. In any case, this time it worked. I can’t wait to explore the new system.

it’s actually $uuid number, that happen when you trying to format btrfs filesystem atop of another filesytem.

you can even change the default checksum if you use custom install by shell / manual partition.