LVM metadata problem - Free space in thin pool reached threshold

A few days back, I posted about my issue on reddit in hopes of an easy fix.
Well, such no luck.

Today I finally had the time to dig deeper into LVM and found two guides [1] [2] outlining the process of migrating a thin pool’s metadata to a larger LV.
However, both guides failed me.

When I disable the LV pool (lvchange -an vg/pool), I can not see the pool_tmeta in /dev/mapper.
And when I activate it, I can not copy it with thin_repair, because thin_repair doesn’t work with live data.
Also, I can not just activate tmeta in read-only mode (like the first guide suggests).
sudo lvchange -pr -ay vg/pool_tmeta produces the message “Unable to change internal LV vg/pool_tmeta directly.”
Using thin_dump (as the second guide suggest) fails just like thin_repair, since it, too, can’t handle live data.

Since migrating failed, I thought about freeing up some space.
I decided to manually remove the vm-AppVM-volatile LV (which I mentioned on reddit), after reading here that it only holds swap+root changes.

However, I had to learn that this not actually frees up space in the right place.
If I understand the achitecture correctly now, it should look somewhat like this:

physical drive > volume group > logical volume pool > tmeta & tdata;
tdata > logical volumes (= VM-"block devices")

so removing the volatile LV only freed up some space inside tdata, but not in the logical volume pool, so that I could expand tmeta…

And now…any ideas how to proceed??

I’m trying to find a solution without resizing (shrinking) tdata.
Because I don’t know how to make sure there won’t occur any data loss.
My tdata is not 100% full, so theoretically there is unallocated space…but will the resize command respect this?!
I.e. will it recognize the free space and prioritize to cut it off, over the allocated space? Probably not, right!?

The idea of migrating the metadata still sounds the most promising to me. But I don’t know how to make it work.

And a bonus question:
How would I prevent this from happening again in the future?
I mean, I could probably just create a much larger tmeta. But that’s just going to delay this issue, right?!

PS. As previously mentioned on reddit, I’m an LVM noob. I spend the last couple of hours reading up on the topic, but my understanding is still very much limited.
Thanks to anyone willing to help!!

Update:
Since I’m not making any real progress with migrating the metadata, I tried reading up on properly resizing LVs. I found this guide.

However, I already can’t perform Step 1 and thus Step 2 doesn’t work.

e2fsck (Step2) complains “tdata is in use. Cannot continue, aborting.”

To have tdata not in use - I guess - I would have to unmount it first.
Thing is, I don’t have it mounted. At least not manually.
df -ah doesn’t show the pool at all and lsblk doesn’t report a mount point for it.
Then, how can I unmount it / make sure it’s not in use?!

I tried deactivating the LV, but lvchange -an vg/pool_tdata complains “Unable to change internal LV vg/pool_tdata directly.”.
And when I disable the entire pool, tdata is no longer accessible via /dev/mapper.

So, it’s basically the same situation as when I tried to migrate the metadata with thin_repair.

Update2:
I’ve finally found a confirmation for one of my suspicions.
One of the “culprits”, that leave me in this unfortunate situation, is this
command for creating the thin pool from the example in the Qubes docs:
sudo lvcreate -T -n poolhd0 -l +100%FREE qubes

Using 100% of the VG is discouraged from because of the exact situation I’m currently in.
The Arch Wiki explains this (as always) very well:

Create the thin pool LV, MyThinPool. This LV provides the blocks for storage.

# lvcreate --type thin-pool -n MyThinPool -l 95%FREE MyVolGroup

The thin pool is composed of two sub-volumes, the data LV and the metadata LV. This command creates both automatically. But the thin pool stops working if either fills completely, and LVM currently does not support the shrinking of either of these volumes. This is why the above command allows for 5% of extra space, in case you ever need to expand the data or metadata sub-volumes of the thin pool.

So, it seems, the only option I currently have, is to add an external drive, add it to the VG, and then expand tmeta…

Gonna report back, once I’ve tried this.

P.S.
I won’t change the example in the Qubes docs, since github doesn’t allow me to register an account (seems they have a problem with my FF privacy settings).
So, whoever has access to the docs, feel free to change the “faulty” command in my stead.

2 Likes

Sooooo… long story short I can’t fix this without a permanent external drive…? Sounds like a serious flaw in Qubes unless I did something really stupid on my end to cause this. I’m screwed right now, half my vm’s won’t launch because of this shit error.

Be aware, that getting out of space on R4.0 and R4.1 was sometimes leading to the LVM corruption and user data being lost. Completely untested territory.
R4.2 has slightly different LVM approach, though.

So… upgrade or upgrade it is, then? RIP my weekend…

If you are very concerned about your user data, you can boot livecd with ubuntu or something and extract all data to external disk, especially if is possible to mount LVM and partitions read-only.
If you are less concerned you can try to fix the problem right in place, also a possible way to proceed.

I’m not missing any data from what I can see. I just can’t launch any qubes because it says vm-pool is out of space…

edit: Rebooting seems to temporarily fix the situation. What’s your other way to proceed?