After reboot Qubes requires "emergency rescue" (exceeds the size of thin pools)

Hi,
So now it’s the third time that Qubes got “broken”.

Here is what I found out in emergency rescue mode:
lvconvert --repair /dev/mapper/qubes_dom0-root–pool →
WARNING: Sum of all thin volume sizes (<9.04 TiB) exceeds the size of thin pools and the size of whole volume group (<1.82 TiB).
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
WARNING: LV qubes_dom0/root-pool_meta8 holds a backup of the unrepaired metadata. Use lvremove when no longer required.

My nvme has 2 TB. According to the "lvs"column “Lsize” however, I have a total of 10 TiB in use.

How can that happen?

I’ve not figured out how this can actually be fixed either… first time (a year ago) I imported backups, this time I’ve not done backups in 2 weeks, so that is not great.

My current thought on how to fix this is:

  1. backup the content of all VMs (using file system access with rsyc) onto folders on an external HDD
  2. reinstall Qubes with ZFS RAID-Z1 instead of LVM
  3. create the VMs and templates etc. again in Qube with same size
  4. delete and overwrite the content of each VM with the content of the backup

However, the important question remains: how can this happen in the first place? Based on the WARNINGs it sounds like it is “my” fault? If so: why do I (long time developer, no linux pro) / any user without deep LVM knowledge / any common user who likes privacy / any linux expert need to set these protections manually? I have NO CLUE on that, and as a developer I think that is something that should be set automatically by Qubes / constantly checked to prevent this case - and not leave it up to the end users.

And please let us spare the “you are responsible yourself for backups!” kind of comments - they are not the answer to the question/the root cause. Qubes has no automated backup policy and backups for all my VMs would take a 1/2 day probably. So that is not an option. But:I use nextcloud to sync the important files, but that is not enough.

Thanks!

This is my speculation as I’m not 100% if this is how it work. Just my thoughts.

You could have few snapshots of your big developer volumes.
For example 1tb developer volume with 2 snapshots (2 snapshots per volume is a default). When you don’t change anything in that volume snapshots take almost no space. But if you update timestamps of files of a 1tb project then in an instant you have 2x 1tb snapshots + original 1tb volume. But it will affect you only when you close/restart qube that use this volume.