SSDs, HDDs, partition formats, etc best practices

summersab · May 2, 2024, 5:44pm

Background
I recently had my SSD die and go into read-only mode. Thankfully, I was able to retrieve all of my data, but it turns out that the biggest problem was my primary template. All of the reads from that template wore out the part of the disk where it was stored, and when the template’s EXT4 partition became corrupted, all VMs using that template failed to boot.

Modern SSDs use TLC (despite Samsung calling their new models 3-bit MLC, which as an article on Ars Technica described, that’s a lot like calling a red car “pink” - it’s misleading marking jargon). TLC is not as robust as MLC, but MLC drives are increasingly rare and expensive.

Best Practices for Storage Devices
Given the unique architecture of Qubes, I wondered what others out there are doing to improve performance and endurance of their drives. I’m considering using a SSD + HDD to take advantage of the performance of a SSD and the reliability of a HDD (maybe even use a virtual RAM disk for some things). However, setting up partition schemes to work with Qubes and LVM storage to take advantage of the two different devices in the best way possible is a bit complicated. On a normal Linux distro, I would put /var on the HDD and everything else on my SSD, but since each VM has its own /var inside the LVM, things get complicated.

Also, there are partition formats that are designed for wear-leveling of flash-based devices like YAFFS2 and UBIFS (both considered successors to JFFS2; LogFS has been removed from the kernel because it is unmaintained). A lot of comments claim that flash controllers implement wear-leveling, but I haven’t seen anything official to support the claim that YAFFS2 is unnecessary.

Suggestions from the Community
I wanted to see if anyone out there has recommendations for using a SSD + HDD (+ RAM disk) setup on Qubes, how they set up their LVM, what partition formats they use, etc. Once we come up with a consensus of sorts, perhaps we can put together a community guide for best practices.

Thanks!

solene · May 2, 2024, 8:20pm

how did you come to this conclusion?

the most advanced and active is f2fs

however, as the disk is encrypted with LUKS, no optimization work whatever the FS because they can’t touch the hardware directly but only a pseudo encrypted device.

summersab · May 2, 2024, 9:11pm

Very good question. Before realizing that my SSD was dying, I started looking at the VM debug terminals for errors, doing fsck where I could, etc. The one VM that was LOADED with EXT4 errors (that could not be fixed) was my default template. It was throwing the kernel message:

“EXT4-fs (xvda3): warning: mounting fs with errors, running e2fsck is recommended.”

I figured the most logical explanation would be that the frequent access to that part of the disk nuked the drive.

Strangely (and fortunately), Qubes let me back up all of my VMs - including that broken template. However, after restoring the backups, all of the EXT4 errors were still present in the template. So, I tried a few things:

Setting kernelopts to fsck.mode=force fsck.repair=yes,
- Didn’t work - the VM always mounted the virtual disks on xvda before fsck could run
- Still getting the “running e2fsck is recommended” kernel message
Running fsck from dom0 against the devices in /dev/qubes_dom0
- It would only work against the *-private devices, not the *-root devices
- I believe the errors were occurring in *-root from what I understand since the VM debug terminal showed all of the EXT4 errors being in xvda3
Setting kernelopts to single so I could run fsck manually without devices being mounted
- It appears that the defaut kernel doesn’t support this, anymore
- The debug terminal showed the message Unknown kernel command line parameters "rd_NO_PLYMOUTH single", will be passed to user space.

None of that worked. I eventually had to rebuild my template from scratch. It was a PITA, but once I did that, my VMs worked, again.

Is there something else I should have tried?

This is what I thought. Is there a way to set up the LVM to store *-private volumes on the HDD but keep *-root volumes on the SSD? If my assumptions are correct about the excessive access to the template being the problem, this probably wouldn’t have saved my SSD since the *-root is what was corrupted (I think - that is where xvda lives, right?), but it may have prolonged the life of the device.

solene · May 2, 2024, 10:20pm

there are no evidence thar reads wear SSD cells. See storage - Will reading data cause SSD's to wear out? - Super User for example

SSD are particularly solid (haha, unattended pun because SSD stands for Solid State Drive), they have a hidden stock of cells because some will wear out prematurely upon write, when this happen, and hidden cells will be made available to replace them automatically and the faulty one are never used again.

You may have experience a corrupted filesystem issue.

summersab · May 2, 2024, 10:28pm

Hmm. Well, I find it strange that it only impacted this single template. Also, my SSD is DEFINITELY bricked. The read-only bit got set just like it does on a SD card (I’ve had that happen on Raspberry Pis before). The drive gets locked, it cannot be unlocked, and it is permanently read-only.

I don’t have any other explanation for why the SSD would die/get locked at exactly the same time that this template began throwing EXT4 errors. I don’t do anything inside the template except run updates or make a few config tweaks so they are present in the downstream VMs. If reads don’t impact a SSD, why on earth would these two things be correlated?

Also, do you have any thoughts on how to run fsck on the root device manually? Single user didn’t work, kernel params didn’t work, and running from dom0 didn’t work. I don’t know what else I could have tried.

apparatus · May 3, 2024, 5:29am

Boot from some other OS (e.g. from Live OS like GParted) and do it there.

summersab · May 3, 2024, 6:21am

I’m not sure how that would be any different. I actually did do this but didn’t mention it because running fsck from dom0 would be exactly the same as running it from a live OS unless I’m missing something.

Running fsck on ANY of the *-root devices from dom0 or a live distro results in the following:

fsck from util-linux 2.38.1
e2fsck 1.46.5 (30-Dec-2021)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/mapper/qubes_dom0-vm--fedora--38--xfce--root

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Found a gpt partition table in /dev/mapper/qubes_dom0-vm--fedora--38--xfce--root

I’m not sure how to check the disk, this way.

If the single kernel param worked, I could check the filesystem from inside the VM itself, but apparently, it has been removed from the default kernel provided by Qubes based on the warning message at boot.

solene · May 3, 2024, 6:37am

in dom0 terminal, can you run “touch hello.txt” ?

SSDs can die too, maybe it’s dying or its controller has issues which is making ext4 errors and data corruption

apparatus · May 3, 2024, 6:57am

My bad, I thought you wanted to fsck dom0 rootfs.

summersab · May 3, 2024, 8:31pm

I can in my working system. I don’t remember if the system on my SSD would let me do that, but I don’t think so. I know that some basic utilities (like sudo) wouldn’t work properly and were throwing errors or not working at all. I still have the drive and could boot it up if I need to do so.

Oh, of course, but when SSDs die, they die HARD. Perhaps my understanding is dated, but SSDs don’t have as robust of a SMART monitoring framework as spinning disks (mainly because the physical way the drives work is so different). When they die, the read-only bit gets set, and everything locks up. Things appear to still work for awhile because so much stuff runs in RAM, but any write operations that occur when that bit gets set get corrupted. I’ve had it happen on Raspberry Pis before (Samsung EVO cards are just . . . just the worst), but this is my first failure on a laptop/desktop.

To keep this thread on track, these are my remaining questions:

How can I run fsck on the *-root devices (which contain xvda1, xvda2, and xvda3, correct)? Passing kernel params to run it on boot doesn’t work, and neither does single user mode. If I could have fixed my template once I restored the backup, I would have saved a few hours of time.
Is there possibly a better way to partition and set up Qubes to take advantage of SSDs and HDDs (or multiple drives in general)? This is more of a call for open discussion.

nealr · May 4, 2024, 10:20am

I have only a little experience with Qubes, but a lot of experiences with applications that do terrible things to mass storage - Elasticsearch running 24x7 streaming of data, for example.

Back in 2019 ago I got some old HP workstations - the last generation to use DDR3. The Samsung SSDs in them were used for boot and ZFS cache duties, bulk storage was Seagate IronWolf spindles. The Samsung products failed quickly and I chose Seagate Nytro “medium endurance” SSDs instead. These would take three drive writes per day and they weigh about triple of the Samsung consumer products. I finally did have one die about six months ago, right at the end of its five year warranty period. The other three I have are still happily rolling along.

I am very sad that the little SATA SSD drives from that product line are gone from the channel. The only thing left in the $100 range are the 240GB XF1230, which are only capable of 0.7 DWPD. That’s probably sufficient to stand up to anything Qubes does and it’s what I had available yesterday when I installed 4.2.1 after several months of no Qubes around here. Amazon has 480 GB drives for $141 and read intensive Nytro 5350S 1.92TB for $180. I’m less familiar with the specs for that one.

I’m retiring all my DDR3 gear, new laptop is a Dell Precision 7730 with a fearful Toshiba 1TB NVMe drive. Nytro NVMe aren’t on Amazon so there’s a 2TB Western Digital Red NVMe drive in my future. There’s another Dell laptop due, likely a 7740, and it’ll get the same.

You spend roughly 50% more for datacenter grade gear, but when you do you avoid writing posts like yours. I buy the cheapest old junk I can get computer wise, except for drives.

What I would like to see on the Qubes software front is the inclusion of ZFS in the default install. I’d round up some of the Nytro 480GB SATA SSDs and use a third for system, a third for cache, and leave a third free for cache duties if the first third gets tired.

quantum · May 5, 2024, 2:06pm

Samsung 980 & 990 M.2 SSD are known for early fails and some OEM builders have stopped purchasing them for new builds. I do not know the status of recent firmware “fixes”, but suspect some “OS” issues may be due to these drives. Something to ask.

summersab · May 5, 2024, 7:35pm

I absolutely agree with this. For years, I’ve only bought refurbished enterprise-grade laptops off of eBay that still have a remaining warranty. Most of them were either returns that had an issue, the manufacturer fixed it, and is selling at a loss to at least make some money (I like to think of it as “pre-tested”) or owned by executives who frequently trade up.

Those manufacturer warranties are transferable and are usually cover the system for three years, and I’ve typically found models with 1.5 years remaining. If a single pixel goes bad, you can get it fixed. It’s worth it just for that, but the build quality is far superior to anything you’d pick up at a big box store.

This (I suspect) is due to the switch from MLC to TLC. Flash memory comes in different types: SLC, MLC, TLC, QLC, and now, PLC. They stand for Single, Multi, Triple, Quad, and Penta Level Cell. It’s how many bits can be stored in a single memory cell. You get more storage in the same amount of physical space, but the endurance drops substantially. SDXC, SDUC, and all that nonsense just denotes the I/O speed and has nothing to do with the quality of the drive itself.

Samsung in its marketing genius began referring to its TLC drives as “3-bit MLC” since “MLC” means “Multi.” I found this nugget on Ars Technica:

“Samsung calls the 980 a ‘three bit MLC’ SSD, which is a lot like referring to a red car as ‘pink.’ To justify this, the company leans on the fact that “M” stands for ‘Multi’ - so in plain English, ‘three bit MLC’ could make sense, despite being utter nonsense in the established terminology of SSDs. From here on out, we’re going to call it what it is: TLC.”

SLC is extremely rare and expensive, these days. It is generally only used by milspec and critical industrial applications. MLC is becoming rare, as well. However, the endurance dropoff is pretty substantial. In general, it’s like this for write cycles per cell:

SLC: 100,000
MLC: 10,000
TLC: 3,000
QLC: 1,000
PLC: 300-500 (maybe)

That’s a pretty dramatic drop-off.

The last consumer MLC M.2 drives that Samsung made were the 970 models. Now, they’re all TLC, and I’m wary of them. Samsung, Intel, and a few others make enterprise-grade SSDs, but I haven’t done enough research to know which models use what technology.

One last note: avoid anything Samsung EVO. They are budget TLC (perhaps QLC, these days). Be it your DSLR camera, a RPi, or a M.2 card, you WILL get burned.

summersab · May 5, 2024, 7:40pm

Back on topic, though: I’d really like to figure out how to fsck the xvda and *-root devices used by Qubes. As I mentioned, I tried:

kernel params to force a check (drives would mount before fsck would run)
booting into single user mode (no longer supported in the default kernel provided by Qubes)
scanning the LVM block device from dom0 or a live OS

None of it worked. There has to be a solution, here.

I’m still curious about how to optimize how Qubes partitions drives in order to improve performance and endurance, but despite the title of this thread, that is almost secondary, right now.

apparatus · May 5, 2024, 7:51pm

sudo losetup -fP --show /path/to/root.img

Then you’ll have:
/dev/loopX
/dev/loopXp1 - EFI System
/dev/loopXp2 - BIOS boot
/dev/loopXp3 - rootfs
After running fsck on /dev/loopXp3 detach loop device:

sudo losetup -d /dev/loopX

summersab · May 5, 2024, 8:13pm

. . . where have you been all my life? This is EXACTLY what I needed and would have saved me from having to rebuild my template from scratch (though it was probably about time that I did so). This should absolutely be added to one of the following guides:

Any suggestions on where to add it? I am capable of submitting a PR to one of the docs, but it would be easier if a dev be willing to add it, instead.

apparatus · May 5, 2024, 8:23pm

I think “Disk troubleshooting” page is more suitable for this. With header being something like “Fixing VM disk image root or home filesystem corruption”.