Bricked: cannot run any qube => free space in thin pool reached threshold

kzzllkb_at_guerrilla · April 9, 2022, 6:21am

I was moving some large files around in a qube. Everything worked normal, I did not get any indication that something was wrong. Then I shut laptop off for the day.

Next day when I boot Qubes 4.1, none of the qubes that start on boot are running, not even sys-usb. So I try to run it manually and get this message:

Qube Status: sys-usb
sys-usb has failed to start: Cannot create new thin volume, free space in thin pool VolGroup0/vm-pool0 reached threshold.

I click on the “disk” icon and see that vm-pool0 is using 91.x%.
In the end I had to boot from a LiveUSB to manually delete some files out of a qube’s LV.

Back in the day of Qubes 4.0, I remember getting a soft-warning at 90%, and at 95% those annoying warnings every time a qube is started. But nothing like this! No warning… bam qubes is a brick.

How do I change the thin pool threshold in Qubes 4.1 similar to how it was in Qubes 4.0, with the difference:

soft-warning at 50Gb free space (instead of 10% in Q4.0)
hard-warning at 10Gb free space (instead of 5% in Q4.0)
brick at 0Gb free space (instead of 10% in Q4.1)

Additionally, looking at [dom0 ~]$ nano /etc/lvm/lvm.conf, why is thin_pool_autoextend_threshold = 90 ?
I mean… qubes is a single-user OS that deliberately makes multi-boot very difficult… so naturally during install I chose: sudo lvcreate -T -l +100%FREE -c 256K VolGroup0/vm-pool0

Is this thin_pool_autoextend_threshold = 90 the cause of turning qubes into a brick? Are there negative consequences if I change this value to thin_pool_autoextend_threshold = 100 ? (shouldn’t it be 100 by default?)

enmus · April 9, 2022, 1:16pm

This is exactly what you should do in order to avoid what it happened to you! No other reasons, as far as I know. I even set it on 65%. i can always resize it if needed, by 5% step, and that way I won’t forget about it, but constantly being aware of it.

rravnwr_at_sharklase · April 10, 2022, 12:13am

Thanks for confirming @enmus

Just did some test, filling a new qube with data…
fallocate -l 260G test260G.img did not work though, sure the qube-manager indicated a large qube present, but the “total disk usage” icon of xfce did not move.
but dd if=/dev/zero bs=100G count=1 of=/home/user/bigfile3 did do the trick.
I’m at 96.4% disk usage now, got a warning in the top right corner, but everything works as normal.

Can someone inform the developers please*)? As when a newbie bricks their system like this, he/she might not know how to un-brick it.

[dom0 ~]$ nano /etc/lvm/lvm.conf
change:

thin_pool_autoextend_threshold = 90

into :

thin_pool_autoextend_threshold = 100

*) I know… report an issue via GitHub… but the thing is… GitHub does not accept new accounts using @guerrillamail.com… Why does everyone/-thing needs an account anyway?

fsflover · November 11, 2022, 8:11pm

The same recently happened to me. I was also moving large file between qubes, my vm-pool went over 90% and I couldn’t start any qubes anymore. This helped:

Indeed, it would be helpful to change the default, since ordinary users may not be able to fix this easily.

enmus · November 12, 2022, 12:27am

What happens when this happens and the threshold is already 100%? Weren’t those 90% actually your savior?
I’m not sure, just want to be.

brycepg · December 15, 2022, 10:12pm

I received this error after rebooting after stage 4 from qubes 4.0 to 4.1 and this config file fixed it for me. Thank you

qinix · September 19, 2023, 4:54pm

Is it possible that this setting
thin_pool_autoextend_threshold = 100
got somehow overwritten?
Recently I didn’t make any upgrades but suddenly qubes did not start because of the “reached threshold” issue. Some time ago I had this issue and fixed it by setting it to 100. It turned out the threshold was somehow overwritten and setting it to 90 did cause some trouble.

PLZ_74072 · September 14, 2024, 10:42pm

I have Qubes version 4.2.2
poolhd0_qubes - data - 92,7%

Can I

change this value to thin_pool_autoextend_threshold = 100?

in the same way?

Or what do I have to do to get this VM running again?

===========================

Ich habe die Qubes-Version 4.2.2
poolhd0_qubes - data - 92,7%

Kann ich das

Ändern Sie diesen Wert in thin_pool_autoextend_threshold = 100

auch so einrichten?

Oder was muss ich tun um diese VM wieder zum laufen zu bringen?

bob123 · September 30, 2024, 9:36pm

Changing this configuration file from “90” to “100” is a hack and once you reach 100% commit what do you do?

There has to be a better solution to this problem.

I deleted a bunch of VMs but this error persists. VM deletion is not a fix.

What is the correct fix? I need to reclaim the unused space, it’s not clear how that is done. I see no Qubes-specific tool to assist with this. Please help

apparatus · October 1, 2024, 5:53am

Maybe you need to remove the qubes that you copied large data to/from that is caused this issue.
Backup the qubes or data on them and try to remove them.
Then run this command to trim the disk if you have SSD:

sudo fstrim -av

bob123 · October 2, 2024, 11:21am

TL;DR: Deleting VMs “should” work as expected for this problem.

I appreciate the response.

An fstrim was not necessary, I am hesitant to issue trim commands to the SSD.

What did I try? Initially I deleted several “TemplateVM” instances totalling 5% of vm-pool size, and I was still unable to start any qubes.

I cherry picked one of the vm volumes and mounted it into dom0, then deleted 50% of the data (200GiB), then I performed an lvreduce with a --resizefs option. Qubes Manager showed the correct reduced volume size for the vm/qube (as did lvm tools), however, lvm tools still showed a >90% vm-pool allocation (no change at all) and still no qubes would start.

I then deleted the VM associated with the lv that I resized (everything is backed up). After this, rechecking vm-pool allocation using lvm tools showed the allocated % was reduced for vm-pool, and everything seems to work again.

Some parting thoughts:

It’s not clear to me why deleting enough TemplateVMs to bring allocation % down from 93% to 88% did not bring allocation down at all. Near as I can tell all vm types have their vols allocated from vm-pool and templatevm volumes look the same as appvm volumes (to me anyway.)

It’s not clear to me why lvreduce did not result in a reduction of vm-pool allocation, even though lvm tools and-also Qubes Manager showed a correctly reduced volume size for the vm.

While I am unstuck, and without changing the 90% threshold (very pleased), I now have the dilemma of needing to restore a VM which will no doubt bring me back into a bad state with no obvious solution for resizing the filesystem (I can’t add more physical storage) – but that’s maybe for another post.

Ekas · December 2, 2024, 1:55pm

Anyway, what’s the real solution? What this message is think about? Same problem. How to manage that space?