SSD maximal performance : native sector size, partition alignment

4096 benchmark :

CPU : I7-10750H
Storage : WD SN 730 512GB
File System : LVM+XFS

Linux dom0 5.10.90-1.fc32.qubes.x86_64 #1 SMP Thu Jan 13 20:46:58 CET 2022 x86_64 x86_64 x86_64 GNU/Linux

Startup finished in 8.740s (firmware) + 2.472s (loader) + 2.947s (kernel) + 7.879s (initrd) + 3.588s (userspace) = 25.628s
Startup finished in 8.720s (firmware) + 2.469s (loader) + 2.947s (kernel) + 7.896s (initrd) + 3.679s (userspace) = 25.713s
Startup finished in 5.331s (firmware) + 2.479s (loader) + 2.947s (kernel) + 8.438s (initrd) + 3.619s (userspace) = 22.816s 

5.75
4.60
4.59
4.61
4.59

# directio
Finished, time 13:27.041, 486745 MiB written, speed 603.1 MiB/s

# no directio
Finished, time 12:29.073, 486745 MiB written, speed 649.8 MiB/s

There seems to be an error running reencrypt without direct io which shows 50MiB in 512 sector size, probably a hardware error, but Iā€™m clearly seeing a 500h++ ETA, hmmā€¦
But as you can see in the new benchmark on 4096 without direct io, it looks better.

I have try browsing, watching movie, and etc, everything seems fine.

FYI, loop device still using 512b sector size, I have a workround for that, but i think its complicated for non tech person to apply. if you want to open an issue, kindly open this too.

and letā€™s see for part 2 in the Firecuda nvme, maybe i will make a guide for this.

1 Like

Please file an issue for this.

Qubes OS should definitely default to 4096 byte sectors unless it has reason to believe a different sector size is better. My understanding is that 512 byte sectors are almost always emulated nowadays, with the actual sector size being 4096 bytes.

2 Likes

Will do.

yes fs reformat would automatically use 4096 if the sector size already use 4096, i have at least 4 ssd with 4k support, but none of them are using 4096 as default, perhaps itā€™s because compability issue, thatā€™s why many vendor not use it as defaultā€¦

current installation of qubes uses cryptsetup version < 2.4, which doesnā€™t automatically use sector size 4096 on luks. RFC: Default sector size (!135) Ā· Merge requests Ā· cryptsetup / cryptsetup Ā· GitLab.

using 4k on luks should improve performance too as reported here https://www.reddit.com/r/Fedora/comments/rzvhyg/default_luks_encryption_settings_on_fedora_can_be/

but using 4k luks would lead to another error like i mention above, manual configuring is needed, Iā€™ll report and open an issue after ensuring 4k loop device works fine.

4 Likes

vm boot benchmark :

#full-4096
3.77
3.89
4.03
3.86
3.90

#minimal-4096
3.68
3.67
3.62
3.79
3.68


#full-512b
3.62
3.70
3.82
3.58
3.62

kdisk benchmark :

I donā€™t see any improvement with 4kn drive in the vm, but i might be wrong, since i donā€™t do other test.

1 Like

I would suggest increasing the sample size. Those SSD drives have big memory caches (RAM on board), which will hide the real results until that cache is filled/missed.

Your results already show some differences, while difficult to interpret. Most of your write tests show improvements; while most of your read tests show the opposite.

1 Like

This doesnā€™t seem to be a problem with Btrfs. Iā€™ve successfully converted my LUKS2 device to 4K sectors.

2 Likes

Tried to give some time here getting errors where cryptsetup complains about alignment being impossible when formattingā€¦

/sys/block/sda/queue/ where:

  1. hw_sector_size: 512
  2. logical_block_size: 512
  3. max_segment_size: 65536
  4. minimum_io_size: 4096
  5. optimal_io_size: 0
  6. physical_block_size: 4096

This is a Ctitical MX500 over SATA2 controller.

Steps:

  1. wipefs -a /dev/sda
  2. gdisk /dev/sda
  3. x (expert)
  4. l 4096 (change alignment to 4096 sectors)
  5. m (return to menu)
  6. o (greate GPT table)
  7. y (accept)
  8. n (new partition)
  9. 1
  10. Enter (selects 4096 as first sector. good)
  11. +1GB (for boot partition)
  12. Enter (8300 type linux partition)
  13. n (new partition)
  14. 2 (second primary partition)
  15. Enter (chose next sector)
  16. Enter ( choose last sector)
  17. Enter (chooses 8300 Linux filesystem)
  18. write
  19. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda2

Still no luck whatever I do. Cryptsetup luksFormat with --sector-size 4096 gives ā€œDevice size is not aligned to requested sector sizeā€

  1. wipefs -a /dev/sda
  2. fdisk /dev/sda -b 4096
  3. g (created GPT partition table)
  4. n (new partition)
  5. 1
  6. Enter (first sector 256)
  7. +1GB
  8. n
  9. 2
  10. Enter
  11. w
    The disk is not refreshed even if calling partprobe /dev/sdaā€¦ Rebooting into ISO
  12. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda2
    Damn. fdisk never syncs the changes?! damnit.

Redoing gdisk:

  • wipefs -a /dev/sda
  • gdisk /dev/sda
  • o (greate GPT table)
  • y (accept)
  • n (new partition)
  • 1
  • Enter (selects 2048 as first sector?)
  • +1GB (for boot partition)
  • Enter (8300 type linux partition)
  • n (new partition)
  • 2 (second primary partition)
  • Enter (chose next sector)
  • Enter ( choose last sector)
  • Enter (chooses 8300 Linux filesystem)
  • write
    Still not aligned.
  1. wipefs -a /dev/sda
  2. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda
    Worksā€¦ Alignments are wrong if partitoning with sfdisk and gdisk.

Some notes:

https://linux-blog.anracom.com/2018/12/03/linux-ssd-partition-alignment-problems-with-external-usb-to-sata-controllers-i/

https://linux-blog.anracom.com/2018/12/03/linux-ssd-partition-alignment-problems-with-external-usb-to-sata-controllers-i/

---- Edit of what worked
partprobe without drive specification workedā€¦ Weird but nice.
CTRL-ALT-F2 (console)

  1. wipefs -a /dev/sda
  2. fdisk /dev/sda
  3. n (new partition)
  4. 1
  5. p (primary)
  6. Enter (first sector 2048)
  7. +1GB
  8. n
  9. 2
  10. p (primary)
  11. Enter
  12. w
  13. partprobe
  14. cryptsetup -c aes-xts-plain64 -h sha512 -s 512 --sector-size 4096 luksFormat /dev/sda2

Then resuming your instructions

  1. cryptsetup luksOpen /dev/device luks
  2. pvcreate /dev/mapper/luks
  3. vgcreate qubes_dom0 /dev/mapper/luks
  4. lvcreate -n swap -L xxG qubes_dom0 ( ex : 8G / 16G )
  5. lvcreate -T -L 20G qubes_dom0/root-pool
  6. lvcreate -T -l 90%FREE qubes_dom0/vm-pool
  7. lvs (to check your vm-pool size)
  8. lvcreate -V20G -T qubes_dom0/root-pool -n root ( Why 20G is not enough for simple recovery if not multiple templates being installed at the same time from dom0? I a not ready to reserve 40G for dom0 and preferred when it was in the same vm-pool to grow dynamically with better warnings, but agree of it being in distinct pool now. )
  9. lvcreate -VxxxG -T qubes_dom0/vm-pool -n vm ( ex : -V60G / -V360G )
  10. mkfs.ext4 /dev/qubes_dom0/vm (no need to specify sector size, if your disk is already use 4096)

ctrl + alt + f6 (return to installer)
enter disk and rescan, choose drive, custom (not blivet), click unknown, and set :

  • 1 GiB > format > ext 4 > mount point /boot > update
  • 40 GiB > format > xfs / ext 4 > mount point / > update
  • (swap) > format > swap > update
  • leave qubes_dom0/vm alone.
    click done. Accept changes, Begin installation.
    Reboot.

Second stage install: went ahead and configured system as wanted.
Templates installs, but install fails at configuring sys-firewall. On reboot, all vms are there but none starts properly.
When looking at /dev/mapper/sys: sys-net, sys-firewall, sys-usb related filesystems were not created, which of course fails when starting VMs.

@51lieal

after boot : donā€™t configure anything, click done, and login.

Any insights on why continuing second stage install doesnā€™t work at that point?

1 Like

Iā€™m try to answer based on what question i found there.

  • fdisk definitely work if you installing using bios, and gdisk for uefi.

partition tables can expect 1MiB offset for the begin of the first partition, means 2048 sectors for 512b or 256 sectors for 4kb disks.

so when you fdisk there, the first sector should show the default first sector 256 which is fine and good to use.

  • actually 20gb is enough if we can manage what data on root partition, as example sometimes we install 2-3 template at once it could cause template install fails because not enough space in dom0, and some users here experience it (back then when everyone was fail installing kali template), and i think you might fail too, since 20gb is not enough for installing 4 default template, except you install 1 by 1, and delete previous data.

If you want to give btrfs a try it also good, everything is work out of the box. for the layout you can find here just ignore 1-2 thing there in the drive section.

1 Like

As far as I understand, the reason why 4K dm-crypt breaks some VM volumes on LVM Thin but not on Btrfs is a combination of two things.

  1. LVM Thin uses the same logical sector size as the underlying (dm-crypt) block device. And then a 4K LVM Thin block device in dom0 results in a 4K xen-blkfront block device in the VM, because Xen automatically passes through the logical sector size.

    Whereas file-reflink layouts like Btrfs use loop devices, which are currently always configured (by /etc/xen/scripts/block from Xen upstream) with 512 byte logical sectors - again passed through to the VM.

  2. The ā€œrootā€ xvda and ā€œvolatileā€ xvdc volumes donā€™t properly work with 4K sectors because they are disk images containing a GPT/MBR partition table, which can only specify sizes and locations in sector units:

    • The VM initramfs script formatting ā€œvolatileā€ on every VM start currently assumes that a sector is 512 bytes, which should be straightforward to fix (WIP)

    • Itā€™s going to be more difficult to somehow make the ā€œrootā€ volume sector-size agnosticā€¦

    (The ā€œprivateā€ xvdb and ā€œkernelā€ xvdd volumes seem to work fine if /etc/xen/scripts/block is patched to configure them with 4K sectors. Theyā€™re just ext4/ext3 filesystem images without a partition table.)

2 Likes
  1. Then why vm-pool is fine while varlibqubes not, both are using same driver, it could be because of my 4kn template, but i donā€™t think so, I havenā€™t recheck since then.
  2. if you see this ss below, It already use 4k sector.

I donā€™t get the question. What does fine mean? And arenā€™t you benchmarking 512-byte sectors on XFS/file-reflink varlibqubes, vs. 4K sectors on LVM Thin vm-pool (using IIUC a custom partitioned TemplateVM root volume) - which would be two very different storage drivers? Oh, your vm-pool is XFS/file-reflink on top of LVM Thin? Okay that would be the same Qubes storage driver then, but itā€™s still a different (and unusual) storage stack.

1 Like

@rustybird: This is really interesting!!! Please poke me on updates of this. Wonā€™t land under Qubes before next release for sure, but this is really pertinent advancement in my opinion, even more if applying to default partition scheme (thin lvm, seperated root/vm pools).

1 Like

I recently installed qubes exactly as what you have described, creating partitions, specifying qvm-pool, manually installing templates, etc. It all worked well, and Iā€™m grateful for your instructions.

However, I couldā€™t get any of VM to start. In their log, I saw that they complained about the filesystem of /xvdc, as you have described in that GitHub issue. I think that line of qvm-pool command was intentioned to avoid this ( by using lvm thin pool, as you said on GitHub), but unluckily it didnā€™t work for me.

Should I reinstall qubes, or should I build 4kn templates and find a way to transfer them into dom0 without any VM running? Thank you!

Btw, my self-built 4kn template also fails to start for the same reason, in qubes on a 512e ssd.

1 Like

Well i need more detailed what step by step you have done, but letā€™s see next week, Iā€™ll try to create a guide from changing lbaf to using 4k template.

@rustybird Any conclusion/updates/findings?

Only this pull request:

1 Like

@rustybird Sorry I was not more specific: I meant for the root and private volumes creation: was that tested working?

So if I understand well, I could apply your patch and have volatile volume fixed. But for creating root volumes and private volumes, I would need to build ISO, or patch stage 1 and stage 2 install so that when templates are decompressed, those are fixed to create a working system to be able to compare performance properly with/without the fixes.

I was looking for next steps to get main devs attention in seeing actual performance losses/ differences in this thread.

Otherwise, people are trying to get away of LVM thin provisioning model at install as of now. Some wants ZFS,XFS/BRTFS since speed differences are quite important.

One example of that is from @Sven at https://forum.qubes-os.org/t/ext4-vs-btrfs-performance-on-qubes-os-installs as an example of that, showing gains of ~300mb/s write speed by choosing BRTFS at install vs thin provisioning default:

Fixing LUKS+LVM thin provisioning would be great. Otherwise LVM is blamed for performance losses as of now where other implementations are simply not suffering from the same implementation flaws that LVM thin provisioning is suffering from, per Qubes implementation of volatile, private and root volumes creation.

@Demi maybe? I think @rustybird showed where love is needed here: SSD maximal performance : native sector size, partition alignment - #30 by rustybird

2 Likes

Not sure that I understand your question, but standard (i.e. not in like a standalone HVM) private volumes are already sector-size agnostic in their content, so compatibility wise it doesnā€™t matter whether they are presented to the VM as 512B or 4KiB block devices.

Standard root volumes have sector-size specific content, and I donā€™t think itā€™s feasible to dynamically patch that volume content (specifically, the partition table) in dom0, because it contains untrusted and potentially malicious VM controlled data.

Backward compatibility is a real headache here. It seems like the existing root and private volumes should simply be presented to the VM as 512B devices by default for now. In the case of an LVM installation layout, that might even entail forcing 512B sectors for the whole LUKS device - unless thereā€™s a good way to set an independent sector size for the LVM pool or ideally per LVM volume.

1 Like

Cross-referencing important post by @tasket (one filesystem knowledgeable person with a lot of hands on experiment, behind wyng):