SSD maximal performance : native sector size, partition alignment

I think not, because my btrfs installation is fine, but I still have some workround.

and can you give more details about this ?

did this mean, you have succesfully install with lvm+xfs / ext4 ?

@51lieal it is expected that buffered IOs should behave better then directio if every assumptions the tools are making are right. You seem to have something in your successful test case, where I’m confused by the results.

The reencryption itself doesn’t know anything about the underlying partitions. The speed results should reflect only rhe sector size having been applied tonthe LUKS container, nothing else.

No, default partition scheme being thin lvm over cryptsetup failed. In my use case, i am not ready to give up on wyng-backups which relies on thin LVM.

My tests are really non-conclusive for the moment.

I’m stump at not understanding why the installer can scan the 4096 formatted LUKS volume being luksOpen’ while the resulting automatic partitioning fails to provision proper template in root related LVMs. This is why I tagged @marmarek.

I am not aware of the differences in code which correctly creates private volumes counterpart, while failing at deploying root volumes linked to templates rpm.

Why are private volumes and dom0 consistent with underlying disk gpt partition table and cryptsetup created container aligned to 4096 sector size, while root volumes linked to template deployment is failing?

Perhaps because i use qubes to reencrypt ? haven’t try with other os, but i’ve satisfied with the result.


I’ve been playing with lvm configuration and it still fail, let’s see if bypassing initial setup and manually configure would work. (even though it doesn’t make sense to me because btrfs installation is fine)

I’ve confirmed this, manually configure everything is worked, i really don’t know what causing this, in initial setup the error is about libxl failed to add vif device (what device? this is what i’ve confused), so the step you need to do is.

  • add vm-pool.
  • install template.
  • configure everthing.

Your tests reporting cryptsetup-reencrypt results on same hardware, same disk different disk partition table/ partition alignment/partition block size going from direct-io resulsts of 526.8 MiB/s to 718.2 MiB/s in reported tests shows a big difference for the LUKS container alignment performance test alone. This is important report. Like said previously, that reencryption test shouldn’t care on what is the actual content of the container. In my experience, speed with direct-io reported pretty steady speeds all along the reencryption, leading to the hypothesis that all the blocks are forward read, reencrypted and written back to disk without speeding up if unused or slowing down if used. The data seem simply translated and rewritten as it goes.

Buffered IO being improved massively (50MiB/s initially reported) vs 646.6 MiB/s is also an interesting data, showing that better alignment leads to better results, while still showing something off. Buffered IO should be better then direct-io, meaning something is not right (reported by heardware vs real), yet.

@51lieal Can you post recipe of commands that made it successful to you for the thin lvm scenario?

(A list of commands that were successful to you, just like you did on 4.1 installer LVM partitioning - hard to customize, missing space - #5 by 51lieal would permit exact reproducibility of results, intern validity and possible external validity of results. If we come up with proper adjustments, we could open an issue upsteam and challenge others.

1 Like

this is based on uefi, for mbr just ignore efi thing, here is your quick setup:

ctrl + alt + f2 when you in language setup

dd if=/dev/zero of=/dev/device 
gdisk /dev/device
# you need at least 2 partition for mbr and 3 for uefi
1. +600MiB
2. +1GiB
3. the rest of remaining space.

cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/device
cryptsetup luksOpen /dev/device luks
pvcreate /dev/mapper/luks
vgcreate qubes_dom0 /dev/mapper/luks
lvcreate -n swap -L xxG qubes_dom0 ( ex : 8G / 16G )
lvcreate -T -L 40G qubes_dom0/root-pool
lvcreate -T -l 90%FREE qubes_dom0/vm-pool
lvs (to check your vm-pool size)
lvcreate -V40G -T qubes_dom0/root-pool -n root ( 20G is not enough, use at least 40G or more )
lvcreate -VxxxG -T qubes_dom0/vm-pool -n vm ( ex : -V60G / -V360G )
mkfs.xfs /dev/qubes_dom0/vm (no need to specify sector size, if your disk is already use 4096) 

haven't try with ext4, but i think it would work too, since the problem is in initial setup

ctrl + alt + f6
enter disk and rescan, choose drive, custom (not blivet), click unknown, and set :

600 MiB > format > EFI partition > /boot/efi > update
1 GiB > format > xfs / ext 4 > /boot > update
40 GiB > format > xfs / ext 4 > / > update
(swap) > format > swap > update
---
leave qubes_dom0/vm 
click done

configure red mark, and install.

after boot :
don’t configure anything, click done, and login.

qvm-pool -a vm lvm_thin -o volume_group=qubes_dom0,thin_pool=vm-pool,revisions_to_keep=2
reboot

confirm vm is the default_pool

qubes-prefs | grep pool ( in 3 installation, vm is automatically default_pool )
# if not : 
qubes-prefs default_pool vm

set default kernel in qubes-global-settings.
set none in all of the qubes default.

template directory = /var/lib/qubes/template-packages/
install all template.
use salt to configure vm.

qubes is ready to use.
I have update everything then reboot, everything still good.

4096 benchmark :

CPU : I7-10750H
Storage : WD SN 730 512GB
File System : LVM+XFS

Linux dom0 5.10.90-1.fc32.qubes.x86_64 #1 SMP Thu Jan 13 20:46:58 CET 2022 x86_64 x86_64 x86_64 GNU/Linux

Startup finished in 8.740s (firmware) + 2.472s (loader) + 2.947s (kernel) + 7.879s (initrd) + 3.588s (userspace) = 25.628s
Startup finished in 8.720s (firmware) + 2.469s (loader) + 2.947s (kernel) + 7.896s (initrd) + 3.679s (userspace) = 25.713s
Startup finished in 5.331s (firmware) + 2.479s (loader) + 2.947s (kernel) + 8.438s (initrd) + 3.619s (userspace) = 22.816s 

5.75
4.60
4.59
4.61
4.59

# directio
Finished, time 13:27.041, 486745 MiB written, speed 603.1 MiB/s

# no directio
Finished, time 12:29.073, 486745 MiB written, speed 649.8 MiB/s

There seems to be an error running reencrypt without direct io which shows 50MiB in 512 sector size, probably a hardware error, but I’m clearly seeing a 500h++ ETA, hmm…
But as you can see in the new benchmark on 4096 without direct io, it looks better.

I have try browsing, watching movie, and etc, everything seems fine.

FYI, loop device still using 512b sector size, I have a workround for that, but i think its complicated for non tech person to apply. if you want to open an issue, kindly open this too.

and let’s see for part 2 in the Firecuda nvme, maybe i will make a guide for this.

Please file an issue for this.

Qubes OS should definitely default to 4096 byte sectors unless it has reason to believe a different sector size is better. My understanding is that 512 byte sectors are almost always emulated nowadays, with the actual sector size being 4096 bytes.

Will do.

yes fs reformat would automatically use 4096 if the sector size already use 4096, i have at least 4 ssd with 4k support, but none of them are using 4096 as default, perhaps it’s because compability issue, that’s why many vendor not use it as default…

current installation of qubes uses cryptsetup version < 2.4, which doesn’t automatically use sector size 4096 on luks. RFC: Default sector size (!135) · Merge requests · cryptsetup / cryptsetup · GitLab.

using 4k on luks should improve performance too as reported here https://www.reddit.com/r/Fedora/comments/rzvhyg/default_luks_encryption_settings_on_fedora_can_be/

but using 4k luks would lead to another error like i mention above, manual configuring is needed, I’ll report and open an issue after ensuring 4k loop device works fine.

2 Likes

vm boot benchmark :

#full-4096
3.77
3.89
4.03
3.86
3.90

#minimal-4096
3.68
3.67
3.62
3.79
3.68


#full-512b
3.62
3.70
3.82
3.58
3.62

kdisk benchmark :

I don’t see any improvement with 4kn drive in the vm, but i might be wrong, since i don’t do other test.

I would suggest increasing the sample size. Those SSD drives have big memory caches (RAM on board), which will hide the real results until that cache is filled/missed.

Your results already show some differences, while difficult to interpret. Most of your write tests show improvements; while most of your read tests show the opposite.

This doesn’t seem to be a problem with Btrfs. I’ve successfully converted my LUKS2 device to 4K sectors.

Tried to give some time here getting errors where cryptsetup complains about alignment being impossible when formatting…

/sys/block/sda/queue/ where:

  1. hw_sector_size: 512
  2. logical_block_size: 512
  3. max_segment_size: 65536
  4. minimum_io_size: 4096
  5. optimal_io_size: 0
  6. physical_block_size: 4096

This is a Ctitical MX500 over SATA2 controller.

Steps:

  1. wipefs -a /dev/sda
  2. gdisk /dev/sda
  3. x (expert)
  4. l 4096 (change alignment to 4096 sectors)
  5. m (return to menu)
  6. o (greate GPT table)
  7. y (accept)
  8. n (new partition)
  9. 1
  10. Enter (selects 4096 as first sector. good)
  11. +1GB (for boot partition)
  12. Enter (8300 type linux partition)
  13. n (new partition)
  14. 2 (second primary partition)
  15. Enter (chose next sector)
  16. Enter ( choose last sector)
  17. Enter (chooses 8300 Linux filesystem)
  18. write
  19. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda2

Still no luck whatever I do. Cryptsetup luksFormat with --sector-size 4096 gives “Device size is not aligned to requested sector size”

  1. wipefs -a /dev/sda
  2. fdisk /dev/sda -b 4096
  3. g (created GPT partition table)
  4. n (new partition)
  5. 1
  6. Enter (first sector 256)
  7. +1GB
  8. n
  9. 2
  10. Enter
  11. w
    The disk is not refreshed even if calling partprobe /dev/sda… Rebooting into ISO
  12. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda2
    Damn. fdisk never syncs the changes?! damnit.

Redoing gdisk:

  • wipefs -a /dev/sda
  • gdisk /dev/sda
  • o (greate GPT table)
  • y (accept)
  • n (new partition)
  • 1
  • Enter (selects 2048 as first sector?)
  • +1GB (for boot partition)
  • Enter (8300 type linux partition)
  • n (new partition)
  • 2 (second primary partition)
  • Enter (chose next sector)
  • Enter ( choose last sector)
  • Enter (chooses 8300 Linux filesystem)
  • write
    Still not aligned.
  1. wipefs -a /dev/sda
  2. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda
    Works… Alignments are wrong if partitoning with sfdisk and gdisk.

Some notes:

https://linux-blog.anracom.com/2018/12/03/linux-ssd-partition-alignment-problems-with-external-usb-to-sata-controllers-i/

https://linux-blog.anracom.com/2018/12/03/linux-ssd-partition-alignment-problems-with-external-usb-to-sata-controllers-i/

---- Edit of what worked
partprobe without drive specification worked… Weird but nice.
CTRL-ALT-F2 (console)

  1. wipefs -a /dev/sda
  2. fdisk /dev/sda
  3. n (new partition)
  4. 1
  5. p (primary)
  6. Enter (first sector 2048)
  7. +1GB
  8. n
  9. 2
  10. p (primary)
  11. Enter
  12. w
  13. partprobe
  14. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda2

Then resuming your instructions

  1. cryptsetup luksOpen /dev/device luks
  2. pvcreate /dev/mapper/luks
  3. vgcreate qubes_dom0 /dev/mapper/luks
  4. lvcreate -n swap -L xxG qubes_dom0 ( ex : 8G / 16G )
  5. lvcreate -T -L 20G qubes_dom0/root-pool
  6. lvcreate -T -l 90%FREE qubes_dom0/vm-pool
  7. lvs (to check your vm-pool size)
  8. lvcreate -V20G -T qubes_dom0/root-pool -n root ( Why 20G is not enough for simple recovery if not multiple templates being installed at the same time from dom0? I a not ready to reserve 40G for dom0 and preferred when it was in the same vm-pool to grow dynamically with better warnings, but agree of it being in distinct pool now. )
  9. lvcreate -VxxxG -T qubes_dom0/vm-pool -n vm ( ex : -V60G / -V360G )
  10. mkfs.ext4 /dev/qubes_dom0/vm (no need to specify sector size, if your disk is already use 4096)

ctrl + alt + f6 (return to installer)
enter disk and rescan, choose drive, custom (not blivet), click unknown, and set :

  • 1 GiB > format > ext 4 > mount point /boot > update
  • 40 GiB > format > xfs / ext 4 > mount point / > update
  • (swap) > format > swap > update
  • leave qubes_dom0/vm alone.
    click done. Accept changes, Begin installation.
    Reboot.

Second stage install: went ahead and configured system as wanted.
Templates installs, but install fails at configuring sys-firewall. On reboot, all vms are there but none starts properly.
When looking at /dev/mapper/sys: sys-net, sys-firewall, sys-usb related filesystems were not created, which of course fails when starting VMs.

@51lieal

after boot : don’t configure anything, click done, and login.

Any insights on why continuing second stage install doesn’t work at that point?

I’m try to answer based on what question i found there.

  • fdisk definitely work if you installing using bios, and gdisk for uefi.

partition tables can expect 1MiB offset for the begin of the first partition, means 2048 sectors for 512b or 256 sectors for 4kb disks.

so when you fdisk there, the first sector should show the default first sector 256 which is fine and good to use.

  • actually 20gb is enough if we can manage what data on root partition, as example sometimes we install 2-3 template at once it could cause template install fails because not enough space in dom0, and some users here experience it (back then when everyone was fail installing kali template), and i think you might fail too, since 20gb is not enough for installing 4 default template, except you install 1 by 1, and delete previous data.

If you want to give btrfs a try it also good, everything is work out of the box. for the layout you can find here just ignore 1-2 thing there in the drive section.

As far as I understand, the reason why 4K dm-crypt breaks some VM volumes on LVM Thin but not on Btrfs is a combination of two things.

  1. LVM Thin uses the same logical sector size as the underlying (dm-crypt) block device. And then a 4K LVM Thin block device in dom0 results in a 4K xen-blkfront block device in the VM, because Xen automatically passes through the logical sector size.

    Whereas file-reflink layouts like Btrfs use loop devices, which are currently always configured (by /etc/xen/scripts/block from Xen upstream) with 512 byte logical sectors - again passed through to the VM.

  2. The “root” xvda and “volatile” xvdc volumes don’t properly work with 4K sectors because they are disk images containing a GPT/MBR partition table, which can only specify sizes and locations in sector units:

    • The VM initramfs script formatting “volatile” on every VM start currently assumes that a sector is 512 bytes, which should be straightforward to fix (WIP)

    • It’s going to be more difficult to somehow make the “root” volume sector-size agnostic…

    (The “private” xvdb and “kernel” xvdd volumes seem to work fine if /etc/xen/scripts/block is patched to configure them with 4K sectors. They’re just ext4/ext3 filesystem images without a partition table.)

1 Like
  1. Then why vm-pool is fine while varlibqubes not, both are using same driver, it could be because of my 4kn template, but i don’t think so, I haven’t recheck since then.
  2. if you see this ss below, It already use 4k sector.

I don’t get the question. What does fine mean? And aren’t you benchmarking 512-byte sectors on XFS/file-reflink varlibqubes, vs. 4K sectors on LVM Thin vm-pool (using IIUC a custom partitioned TemplateVM root volume) - which would be two very different storage drivers? Oh, your vm-pool is XFS/file-reflink on top of LVM Thin? Okay that would be the same Qubes storage driver then, but it’s still a different (and unusual) storage stack.

@rustybird: This is really interesting!!! Please poke me on updates of this. Won’t land under Qubes before next release for sure, but this is really pertinent advancement in my opinion, even more if applying to default partition scheme (thin lvm, seperated root/vm pools).