Collection: How to reduce the amount of data written to SSD by Qubes OS

marioabbott · August 22, 2023, 7:05am

Hello!

If you ever tried Qubes OS on a conventional spinning hard drive, the first thing you would notice is your gray hairs would grow faster than it booted. It would take ages from startup until you could use the computer to do something meaningful. So I am quite confident to guess the majority of Qubes users are now using some forms of SSD instead.

Qubes, however, is an SSD killer, at least with its default configuration. One might argue that it is the price of security via compartmentisation. But I still think we can have both security and long-living SSD.

My journey with Qubes OS started few years ago when Qubes was at 4.0 beta and release-candidates. At that time, I wanted to study how Qubes worked, so I turned off LUKS and LVM and just used normal partitioning while installing Qubes. This way, I was able to access Qubes partition from Windows, saw the files and settings, and got the idea what Qubes OS was. But the price was in the course of just few months, the wear level of my brand new SSD jumped up to 36 with 6.5TB written to disk. It was a small SSD, of course. And the 6.5TB was the result of constant cloning-and-deleting of root, private, volatile images, among others.

Certainly I was not using Qubes default configuration. But still. Even now, using Qubes 4.1 in its default configuration (LUKS, LVM thin provisioning) I can still see between 5GB to 10GB of data written to the SSD daily (compared to 200MB to 300MB if using Windows, same usage pattern). Needless to say, if I need to download and keep something, I save it on an external HDD). If I had to update one or more templates, the amount of data written that day would be in the 15GB to 25GB range.

My worst experience with Qubes was when I restored a VM having 5GB data from backup, that operation alone resulted in 19GB, almost 4 times the amount of data, written to SSD.

In an ideal world, I could imagine Qubes to have a “magic switch”. In the “live” position, Qubes would write absolute nothing to the media it booted from (like Tails), and in the “maintenance” position updating or installing new software would be possible. Until then, I will have to try tweaking Qubes to minimise the amount of data it writes to the SSD during its operation. And that is the reason I started this topic: To collect all SSD wear related information into one place.

Some possibilities that I have thought of or tried with limited success:

Settings related to partition boundary, cluster size
Settings related to TRIM (discard) operations
Using minimal templates
Defragment ext4 partition, or re-format it with different settings to make it more compact
Using tmpfs whenever possible/make sense
Disable logging

This list is far from complete and I would appreciate to see more tips or links to relevant topic added. I do not forget the main strength of Qubes is security, so even more important than new tips and links, explanations why doing “this” or “that” is a good/bad idea would be the real treasure for me who is still on the learning curve how to get the most out of this fantastic OS.

For a start, I link this topic (How I learned to love Liteqube (and why you should, too, even if you have enough RAM) - #2 by unman) which I found very promising. Apart from having to install new packages in dom0 which Qubes officially advised against, has anyone any thought (pros and cons) about it?

Thanks

rustybird · August 22, 2023, 10:26am

Have you looked up in your SSD model’s datasheet how many lifetime TBW (terabytes written) are within spec?

E.g. I could supposedly still write 10 TB/month to my drive for 40+ years… (It’s a datacenter-y model, but even on consumer models this just might not be a thing worth micromanaging before a drive is swapped for a bigger capacity one anyway.)

solene · August 22, 2023, 12:03pm

Exactly, the controller is more likely to die without notice before you start wearing out the SSD cells.

marioabbott · August 22, 2023, 12:10pm

No, I have not looked up yet. Although you are the lucky one who could enjoy writing 10TB/month for 40+ years without worrying much, not everybody is so fortunate. In some parts of the world, getting a small capacity SSD is hard enough, let alone a decent laptop capable of running Qubes OS.

Have you been to a computer museum? I have, and I was always amazed what engineers of the 1960 could do with a 1MHz CPU and 4KB of RAM, compared to what today’s people can do with their pocket super-computer (aka “smart” phone) having 10.000 times faster CPU and one million times more memory… I do not think wasting resource is a good thing, even in time of plenty.

Unlike physical world, world of computing has a unique property. If you have a perfect physical object, it is very difficult to make a second object that is identically perfect. With computing, if you have a perfect computer program, millions and billions people can make a copy and enjoy that fine piece of work. Of course, it works the other way around too: millions and billions people can suffer from one programmer’s stupidity. The difference between the two sometimes is just the result of micromanaging. So if we can do something better, what is stopping us?

tempmail · August 22, 2023, 2:30pm

I’m so looking forward to the day this “issue” got it’s “magic switch” on the top of todo list.
While I’m using zram without swap, I admit I don’t see how to “Defragment ext4 partition” is relevant to SSD.

renehoj · August 22, 2023, 8:28pm

the wear level of my brand new SSD jumped up to 36 with 6.5TB written to disk.

That doesn’t sound right, it’s most likely going to be 1 or 0.5%

Standard consumer NVMe drives are typically 600 or 1200 TB TBW, writing 6 TB is not going to add 36% wear to the drive.

TBW also isn’t a hard limit, if the drive fails the warranty typically is 5 years or max TBW, whatever happens first. The drive will exceed the TBW limit, you just don’t have any warranty when it happens.

marioabbott · August 24, 2023, 9:04am

I saw this on the net: “ext4 acts in a more intelligent way than merely adding new files into the next available space. Instead of placing multiple files near each other on the hard disk, Linux file systems scatter different files all over the disk, leaving a large amount of free space between them. When a file is edited and needs to grow, there’s usually plenty of free space for the file to grow into. If fragmentation does occur, the file system will attempt to move the files around to reduce fragmentation in normal use, without the need for a defragmentation utility.”

Well, whoever said that thought it was “intelligent”, I do not. I think it is the exact opposite. In the spirit of reducing the amount of data written to SSD, it is even worse. Linux is famous of having million tiny files instead of few huge ones. Imagine that you update a 100-byte file, how much data would be written to disk then? At the filesystem level, something like 4KB would be sent down to the next level. If LVM is involved, minimum one 4MB block (extend) will be written. So updating a 100-byte file would end up with 4MB of data being written to disk. And if “Linux file systems scatter different files all over the disk”, your next 100-byte write would likely not to be in the same 4MB block, meaning even with caching, another 4MB would be sent down to the hardware, and so on.

On another topic in this forum I mentioned having two nearly identical private volumes (of two VMs), each has just 1MB of real data, but Qube Manager says one VM uses little more than 1MB while the other VM uses 149MB of disk. And I think the reason lays in how ext4 and LVM store the data.

When talking about fragmentation, people tend to think of data. Fragmentation in broader term covers “free space fragmentation” too. If “Linux file systems scatter different files all over the disk”, it means heavy free space fragmentation. And LVM container obviously must be bigger to accomodate that fragmented partition. That is why “defragment ext4” is on the list.

marioabbott · August 24, 2023, 9:12am

Please, my SSD is a small capacity one, and if I divide 6.5TB by the SSD capacity, every cell has been written a few dozens of times, and the number 36 seems to be aligned with that fact. I do not think 36 means 36%. It just means 36, the bigger the number the worse. And if I remember correctly, I have seen somewhere that a flash cell is designed to have about 500 writes.

tanky0u · August 24, 2023, 1:41pm

What is the brand/model of the SSD of yours?

tempmail · August 24, 2023, 2:32pm

Thanks for the explanation.
Nevertheless, while I’m using minimal templates, btrfs and zram while swap=0 in order to reduce writing to SSD, I am willing to change my SSDs every 3-5 years finding that cost effective strategy. Although I already have SSD that is 6 years old with status “Good”.

tanky0u · August 24, 2023, 3:06pm

How do you check this “health status” of SSDs in QubesOS? Is there a tool I can use in dom0?

renehoj · August 27, 2023, 4:01pm

sudo smartctl -x /dev/YOUR-DEVICE

It will tell you the TBW, percentage used, and power on time.

PostedPortal · March 17, 2024, 10:45am

do you mind sharing some of your results? I just want to get an idea to compare usage stats. I dont know what some of these things mean

here are some of mine:

Temperature: 50 C
Available spare: 100%
Available Spare Threshold: 10%
Percentage used: 1%
Data units read: 80 TB
Data units Written: 35 TB
Power cycle: 2,798
Unsafe Shutdowns: 359
Read self-test log failed: invalid field in command (0x002)```