I was moving some large files around in a qube. Everything worked normal, I did not get any indication that something was wrong. Then I shut laptop off for the day.
Next day when I boot Qubes 4.1, none of the qubes that start on boot are running, not even sys-usb. So I try to run it manually and get this message:
Qube Status: sys-usb
sys-usb has failed to start: Cannot create new thin volume, free space in thin pool VolGroup0/vm-pool0 reached threshold.
I click on the “disk” icon and see that vm-pool0 is using 91.x%.
In the end I had to boot from a LiveUSB to manually delete some files out of a qube’s LV.
Back in the day of Qubes 4.0, I remember getting a soft-warning at 90%, and at 95% those annoying warnings every time a qube is started. But nothing like this! No warning… bam qubes is a brick.
How do I change the thin pool threshold in Qubes 4.1 similar to how it was in Qubes 4.0, with the difference:
soft-warning at 50Gb free space (instead of 10% in Q4.0)
hard-warning at 10Gb free space (instead of 5% in Q4.0)
brick at 0Gb free space (instead of 10% in Q4.1)
Additionally, looking at [dom0 ~]$ nano /etc/lvm/lvm.conf, why is thin_pool_autoextend_threshold = 90 ?
I mean… qubes is a single-user OS that deliberately makes multi-boot very difficult… so naturally during install I chose: sudo lvcreate -T -l +100%FREE -c 256K VolGroup0/vm-pool0
Is this thin_pool_autoextend_threshold = 90 the cause of turning qubes into a brick? Are there negative consequences if I change this value to thin_pool_autoextend_threshold = 100 ? (shouldn’t it be 100 by default?)
This is exactly what you should do in order to avoid what it happened to you! No other reasons, as far as I know. I even set it on 65%. i can always resize it if needed, by 5% step, and that way I won’t forget about it, but constantly being aware of it.
Just did some test, filling a new qube with data… fallocate -l 260G test260G.img did not work though, sure the qube-manager indicated a large qube present, but the “total disk usage” icon of xfce did not move.
but dd if=/dev/zero bs=100G count=1 of=/home/user/bigfile3 did do the trick.
I’m at 96.4% disk usage now, got a warning in the top right corner, but everything works as normal.
Can someone inform the developers please*)? As when a newbie bricks their system like this, he/she might not know how to un-brick it.
[dom0 ~]$ nano /etc/lvm/lvm.conf
change:
thin_pool_autoextend_threshold = 90
into :
thin_pool_autoextend_threshold = 100
*) I know… report an issue via GitHub… but the thing is… GitHub does not accept new accounts using @guerrillamail.com… Why does everyone/-thing needs an account anyway?
The same recently happened to me. I was also moving large file between qubes, my vm-pool went over 90% and I couldn’t start any qubes anymore. This helped:
Indeed, it would be helpful to change the default, since ordinary users may not be able to fix this easily.
Is it possible that this setting thin_pool_autoextend_threshold = 100
got somehow overwritten?
Recently I didn’t make any upgrades but suddenly qubes did not start because of the “reached threshold” issue. Some time ago I had this issue and fixed it by setting it to 100. It turned out the threshold was somehow overwritten and setting it to 90 did cause some trouble.
Changing this configuration file from “90” to “100” is a hack and once you reach 100% commit what do you do?
There has to be a better solution to this problem.
I deleted a bunch of VMs but this error persists. VM deletion is not a fix.
What is the correct fix? I need to reclaim the unused space, it’s not clear how that is done. I see no Qubes-specific tool to assist with this. Please help
Maybe you need to remove the qubes that you copied large data to/from that is caused this issue.
Backup the qubes or data on them and try to remove them.
Then run this command to trim the disk if you have SSD:
TL;DR: Deleting VMs “should” work as expected for this problem.
I appreciate the response.
An fstrim was not necessary, I am hesitant to issue trim commands to the SSD.
What did I try? Initially I deleted several “TemplateVM” instances totalling 5% of vm-pool size, and I was still unable to start any qubes.
I cherry picked one of the vm volumes and mounted it into dom0, then deleted 50% of the data (200GiB), then I performed an lvreduce with a --resizefs option. Qubes Manager showed the correct reduced volume size for the vm/qube (as did lvm tools), however, lvm tools still showed a >90% vm-pool allocation (no change at all) and still no qubes would start.
I then deleted the VM associated with the lv that I resized (everything is backed up). After this, rechecking vm-pool allocation using lvm tools showed the allocated % was reduced for vm-pool, and everything seems to work again.
Some parting thoughts:
It’s not clear to me why deleting enough TemplateVMs to bring allocation % down from 93% to 88% did not bring allocation down at all. Near as I can tell all vm types have their vols allocated from vm-pool and templatevm volumes look the same as appvm volumes (to me anyway.)
It’s not clear to me why lvreduce did not result in a reduction of vm-pool allocation, even though lvm tools and-also Qubes Manager showed a correctly reduced volume size for the vm.
While I am unstuck, and without changing the 90% threshold (very pleased), I now have the dilemma of needing to restore a VM which will no doubt bring me back into a bad state with no obvious solution for resizing the filesystem (I can’t add more physical storage) – but that’s maybe for another post.