SSD maximal performance : native sector size, partition alignment

The 860 and 870 have had some fixes recently

@renehoj Thanks for the link.

I would understand this impacting operations for an unlocked LUKS container from the OS for daily operations, which was formatted and configured to trigger trimming operations down to the disk firmware from the kernel, but not for cryptsetup-reencrypt operations? But maybe I am misinformed/misreading?

Trim probably doesn’t matter but what about native command queuing?

@renehoj Will edit this post with results later for tests on EVO devices (EVO PRO 860 and EVO 870) and before/after application of this patch. I was not aware of that bug. Was merged upstream there. Thanks!

Since Heads is responsible of OEM->User re-encryption of encrypted device (while old tests were done on older clonezilla for unattended OEM->User disk reencryption and uniqueness of disk images and key for shipped laptops), I will check that kernel they currently use fixes the issue and retest there as well. In my TODOs).

But the question is not related to specific devices like the EVO PRO 860/EVO 870 nor MX500, but to a greater problem: it seems that most SSDs are lying about their real physical sector sizes, have misaligned partition tables for SSD optimal performances and create partitions with improper block sizes defined and sector sizes, optimal values not being currently taken into consideration when creating partition table and partitions under Qubes and other OSes (and taken into consideration in their alignments). Bigger impacts can be seen under non-random read and writes use cases filling SSD caches and where writes are actually lowering performances. Important note here btw, cryptsetup-reencrypt in tested use case is enforcing direct IO, which i’m not even sure the libata patch would be considered (since again, past tests using buffered IO was lowering performance tests and was dismissed): cryptsetup-reencrypt --use-directio -B64 /dev/device --key-slot 0

The improvement noticed is related to an unimportant old Toshiba SSD device, which reports sector size of 512 bytes from smartctl and where calling cryptsetup luksFormat --sector-size=4096 improves reencryption speed. So it seems that misalignment is a more general performance culprit, and question is should we investigate this deeper.

The question is pretty general and concerns automatic partitioning from Qubes installer (and automatic partitioning scheme) for SSD drives specifically: performance improvements/relevance of partitioning alignments, since we cannot rely on what is reported by those SSDs and used to automatize tools optimizations (that could happen if device is not lying only in crypsetup 2.4+ as per referred archlinux article above), while device still report block sizes of 512 (logical[legacy] and physical).

Reposting ArchLinux article on SSD partitining tweaks

Does investigating this makes sense?

It make sense and I’m interested in testing 512 and 4096 performance in the vm.

Qubes use anaconda installer which many of them is from upstream, and for your request above, I’ve seen someone propose this and accepted in fedora 35, then we can only hope qubes dev will move dom0 to 35+ in testing 4.2.

@51lieal unfortunately, trimming (discard) and performance of disks are not testable in “vm” and require manual tweaks to be tested if performance difference is to be tested, since vm performance depends on created lvm partitions (sectors and sizes), vg created block sizes, then LUKS (most important here from my understanding) sectors and block sizes, needing to be coherent with what the firmware does internally. Not an expert whatsoever here, but intuition here seems to confirm that some disks performing better with reencryption speed have properly reported block size from OS:

[user@dom0 ~]$ cat /sys/block/sda/queue/physical_block_size 
4096

Proper testing seem to require redoing aligned partition table, aligned LUKS partition, and then reinstall (while lvm partitioning also seem to matter).
Yet again, in a faster performing drive, lvms were properly created with better alignements:

[user@dom0 ~]$ cat /sys/block/dm-142/queue/physical_block_size 
4096

It also seem to require not to reinstall from the installer directly, but to do some of the actions first from Qubes available terminals prior of going forward in the installer, and having the installer “rescan” the disks to take into account what was done outside of it to proceed in manual partitioning.

Hey! Just saw that you are the user behind the post I was going to refer: 4.1 installer LVM partitioning - hard to customize, missing space - #5 by 51lieal

Basically, applying the following differences to test

  • run cryptsetup-reencrypt from a live cd. Take total time and speed in MiB/s at final output.
  • make aligned partition table in expert mode (just notes) parted -a optimal /dev/sda mklabel gpt with block size of 4096 bytes (default alignment not tuned for special manufacturers)
  • cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --use-random -y -i 10000 --sector-size 4096 luksFormat /dev/sda2
  • Follow through with the rest of the instructions to prepare custom partitions from your referred guide.
  • Install system, make sure that LVM partitions created are good by booting system
  • do cryptsetup-reencrypt again from live CD, check difference of performance and report results.
  • Adapt values above, which otherwise seem to be aligned for sector and block sizes of 512 in current observations, as reported in dom0 by: cat /sys/block/sda/queue/physical_block_size

So as of now, I can already see from observations that some SSDs are not reporting properly their block sizes, and that tooling take reported upper layer block size and apply it to the whole down chain from the installer from automatic partitioning.

It also seems that my particular problem came from having suboptimal partitioning for sector/block sizes on initial install, which were cloned from one disk to another in the past.

Those are notes… Not truth. Further experimentation only will confirm or infirm this hypothesis, while others having only changed sector sizes/block sizes/alignment on the same hardware for the same SSD have witnessed major gains in performance (no vm/Qubes usage reported though), while not reencrypting their disks either (which obviously tackle SSD hardware differences as caches, erase block sizes and firmware optimization. So testing seems the only way to validate this, on same computer with same SSDs, where I will test cloning disk of different reported block sizes (512 vs 4096) on other disk and test and report clone disks (on each other) and variations of observed performance (where cryptsetup-reencrypt being my personal meter).

I’ve already imagined how to setup the drive, do you have idea what kind of test to run ?

In between tests.

Without IO cache from the operating system (DirectIO):
cryptsetup-reencrypt --use-directio -B64 /dev/device --key-slot 0

With IO caching from the operating system:
cryptsetup-reencrypt /dev/device --key-slot 0

I think i would run 3 test,

  1. boot speed
  2. vm benchmark
  3. your re-encrypt test
1 Like

Yesterday i’ve just tried 6 installation, and 4 of them was unsuccessful, before installation i did dd zero to drive, ensuring no data remain.

  1. With xfs 512 sector size, everything is work out of the box.
  2. 3 fail attempt with xfs 4096 sector size, and 1 on ext4, I do short investigation but it didn’t helpfull. Dom0 is fine, but i can’t find any DomU is working (there’s an error in initial setup).
  3. btrfs 4096 is fine, but i haven’t benchmark.

Everything have default configuration.

512 benchmark :

CPU : I7-10750H
Storage : WD SN 730 512GB
File System : LVM-XFS

Linux dom0 5.10.90-1.fc32.qubes.x86_64 #1 SMP Thu Jan 13 20:46:58 CET 2022 x86_64 x86_64 x86_64 GNU/Linux

# Boot speed
Startup finished in 4.897s (firmware) + 2.523s (loader) + 2.946s (kernel) + 8.787s (initrd) + 3.705s (userspace) = 22.861s
Startup finished in 4.868s (firmware) + 2.513s (loader) + 2.938s (kernel) + 8.817s (initrd) + 3.765s (userspace) = 22.902s
Startup finished in 4.874s (firmware) + 2.511s (loader) + 2.945s (kernel) + 8.255s (initrd) + 3.732s (userspace) = 22.318s

# VM Boot
6.24
4.81
4.68
5.14
4.84

# Cryptsetup-reencrypt
Finished, time 15:24.011, 486745 MiB written, speed 526.8 MiB/s

I have found that cryptsetup-reencrypt without --directio is horrible, speed is under 50 MiB/s so i just skip it, and fyi I use my main qubes as host to reencrypt and change nvme lbaf (Dual Boot)

4096 benchmark :

CPU : I7-10750H
Storage : WD SN 730 512GB
File System : BTRFS+blake2b

Linux dom0 5.10.90-1.fc32.qubes.x86_64 #1 SMP Thu Jan 13 20:46:58 CET 2022 x86_64 x86_64 x86_64 GNU/Linux

Startup finished in 4.898s (firmware) + 2.499s (loader) + 2.878s (kernel) + 7.922s (initrd) + 3.489s (userspace) = 21.688s
Startup finished in 4.878s (firmware) + 1.405s (loader) + 2.882s (kernel) + 7.936s (initrd) + 3.523s (userspace) = 20.626s
Startup finished in 4.889s (firmware) + 1.405s (loader) + 2.881s (kernel) + 7.817s (initrd) + 3.524s (userspace) = 20.518s

5.72
4.48
4.62
4.48
4.62

# directio
Finished, time 11:17.770, 486745 MiB written, speed 718.2 MiB/s

# no directio
Finished, time 12:32.743, 486745 MiB written, speed 646.6 MiB/s

I’m suprised when running no directio option, i’ll update in another thread after finding how to configure lvm+xfs / ext4 in 4096 sector size. the question is :

  • Why in xfs (512 sector size) the performance is dropped a lot, 600 to 50 is a huge number.
  • How to make 4096 sector size work with lvm+xfs / ext4 ?
  • If in 4.2 testing qubes planning to use fedora 35+ or other distro, should dev promote using 4096 sector size by default ? since in fedora 35+ anaconda would automatically use 4096 sector size, only if the drive is already using 4096 sector size. (qubes team question)

As a conclusion, using 4096 sector size is very recommended, there’s a lot of benefit gained for modern hardware.

2 Likes

On my experiments, relaunching blivet advanced partition on a failed attempt reports a bunch of wrong partitiions, all corresponding to templates. All private volumes were interestingly fine.

To reproduce my basic initial test result, i simply created the cryptsetup from ctrl-alt-2 over /dev/sda2 with luksFormat, lukOpen’ it then asked Q4.1 over ctrl-alt-6 to rescan the disk prior of doing an automatic partitioning, reclaiming space.

Hints here are that templates rpm instructions may be faulty in deploying raw images into corresponding LVMs on default partitioning scheme? Otherwise it seems that partitions created by second stage install outside of templates installed are fine.

@marmarek some hints?

I think not, because my btrfs installation is fine, but I still have some workround.

and can you give more details about this ?

did this mean, you have succesfully install with lvm+xfs / ext4 ?

@51lieal it is expected that buffered IOs should behave better then directio if every assumptions the tools are making are right. You seem to have something in your successful test case, where I’m confused by the results.

The reencryption itself doesn’t know anything about the underlying partitions. The speed results should reflect only rhe sector size having been applied tonthe LUKS container, nothing else.

No, default partition scheme being thin lvm over cryptsetup failed. In my use case, i am not ready to give up on wyng-backups which relies on thin LVM.

My tests are really non-conclusive for the moment.

I’m stump at not understanding why the installer can scan the 4096 formatted LUKS volume being luksOpen’ while the resulting automatic partitioning fails to provision proper template in root related LVMs. This is why I tagged @marmarek.

I am not aware of the differences in code which correctly creates private volumes counterpart, while failing at deploying root volumes linked to templates rpm.

Why are private volumes and dom0 consistent with underlying disk gpt partition table and cryptsetup created container aligned to 4096 sector size, while root volumes linked to template deployment is failing?

Perhaps because i use qubes to reencrypt ? haven’t try with other os, but i’ve satisfied with the result.


I’ve been playing with lvm configuration and it still fail, let’s see if bypassing initial setup and manually configure would work. (even though it doesn’t make sense to me because btrfs installation is fine)

I’ve confirmed this, manually configure everything is worked, i really don’t know what causing this, in initial setup the error is about libxl failed to add vif device (what device? this is what i’ve confused), so the step you need to do is.

  • add vm-pool.
  • install template.
  • configure everthing.

Your tests reporting cryptsetup-reencrypt results on same hardware, same disk different disk partition table/ partition alignment/partition block size going from direct-io resulsts of 526.8 MiB/s to 718.2 MiB/s in reported tests shows a big difference for the LUKS container alignment performance test alone. This is important report. Like said previously, that reencryption test shouldn’t care on what is the actual content of the container. In my experience, speed with direct-io reported pretty steady speeds all along the reencryption, leading to the hypothesis that all the blocks are forward read, reencrypted and written back to disk without speeding up if unused or slowing down if used. The data seem simply translated and rewritten as it goes.

Buffered IO being improved massively (50MiB/s initially reported) vs 646.6 MiB/s is also an interesting data, showing that better alignment leads to better results, while still showing something off. Buffered IO should be better then direct-io, meaning something is not right (reported by heardware vs real), yet.

@51lieal Can you post recipe of commands that made it successful to you for the thin lvm scenario?

(A list of commands that were successful to you, just like you did on 4.1 installer LVM partitioning - hard to customize, missing space - #5 by 51lieal would permit exact reproducibility of results, intern validity and possible external validity of results. If we come up with proper adjustments, we could open an issue upsteam and challenge others.

1 Like

this is based on uefi, for mbr just ignore efi thing, here is your quick setup:

ctrl + alt + f2 when you in language setup

dd if=/dev/zero of=/dev/device 
gdisk /dev/device
# you need at least 2 partition for mbr and 3 for uefi
1. +600MiB
2. +1GiB
3. the rest of remaining space.

cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/device
cryptsetup luksOpen /dev/device luks
pvcreate /dev/mapper/luks
vgcreate qubes_dom0 /dev/mapper/luks
lvcreate -n swap -L xxG qubes_dom0 ( ex : 8G / 16G )
lvcreate -T -L 40G qubes_dom0/root-pool
lvcreate -T -l 90%FREE qubes_dom0/vm-pool
lvs (to check your vm-pool size)
lvcreate -V40G -T qubes_dom0/root-pool -n root ( 20G is not enough, use at least 40G or more )
lvcreate -VxxxG -T qubes_dom0/vm-pool -n vm ( ex : -V60G / -V360G )
mkfs.xfs /dev/qubes_dom0/vm (no need to specify sector size, if your disk is already use 4096) 

haven't try with ext4, but i think it would work too, since the problem is in initial setup

ctrl + alt + f6
enter disk and rescan, choose drive, custom (not blivet), click unknown, and set :

600 MiB > format > EFI partition > /boot/efi > update
1 GiB > format > xfs / ext 4 > /boot > update
40 GiB > format > xfs / ext 4 > / > update
(swap) > format > swap > update
---
leave qubes_dom0/vm 
click done

configure red mark, and install.

after boot :
don’t configure anything, click done, and login.

qvm-pool -a vm lvm_thin -o volume_group=qubes_dom0,thin_pool=vm-pool,revisions_to_keep=2
reboot

confirm vm is the default_pool

qubes-prefs | grep pool ( in 3 installation, vm is automatically default_pool )
# if not : 
qubes-prefs default_pool vm

set default kernel in qubes-global-settings.
set none in all of the qubes default.

template directory = /var/lib/qubes/template-packages/
install all template.
use salt to configure vm.

qubes is ready to use.
I have update everything then reboot, everything still good.

4096 benchmark :

CPU : I7-10750H
Storage : WD SN 730 512GB
File System : LVM+XFS

Linux dom0 5.10.90-1.fc32.qubes.x86_64 #1 SMP Thu Jan 13 20:46:58 CET 2022 x86_64 x86_64 x86_64 GNU/Linux

Startup finished in 8.740s (firmware) + 2.472s (loader) + 2.947s (kernel) + 7.879s (initrd) + 3.588s (userspace) = 25.628s
Startup finished in 8.720s (firmware) + 2.469s (loader) + 2.947s (kernel) + 7.896s (initrd) + 3.679s (userspace) = 25.713s
Startup finished in 5.331s (firmware) + 2.479s (loader) + 2.947s (kernel) + 8.438s (initrd) + 3.619s (userspace) = 22.816s 

5.75
4.60
4.59
4.61
4.59

# directio
Finished, time 13:27.041, 486745 MiB written, speed 603.1 MiB/s

# no directio
Finished, time 12:29.073, 486745 MiB written, speed 649.8 MiB/s

There seems to be an error running reencrypt without direct io which shows 50MiB in 512 sector size, probably a hardware error, but I’m clearly seeing a 500h++ ETA, hmm…
But as you can see in the new benchmark on 4096 without direct io, it looks better.

I have try browsing, watching movie, and etc, everything seems fine.

FYI, loop device still using 512b sector size, I have a workround for that, but i think its complicated for non tech person to apply. if you want to open an issue, kindly open this too.

and let’s see for part 2 in the Firecuda nvme, maybe i will make a guide for this.