Help! Are my qubes gone after lvm resize?

My vm-pool got to 100% today after a reboot and no VMs would boot.

Following this github issue, I decided to look at my /etc/lvm/lvm.conf to see if my thin pools were set to autoextend.

The lvm.conf had thin_pool_autoextend_threshold set to 100, so I set it to 70, and set the percent to 40.

I rebooted, and it took a while on some LVM start job. When I logged in, my VMs wouldn’t start because of qubes_dom0/vm-*-root missing errors. Yet I have 49.4% disk space still used (500G SSD). The Disk Space Monitor GUI has a 20G spot for varlibqubes and shows a “vm-pool” but with no numbers.

The output of sudo lvs shows the “roots” of my VMs. Does this mean they’re still there?

Are my qubes gone? What could I do in terms of recovery? How bad did I “fsck” things up? Also why was the autoextend disabled?

I do have verified backups but they’re a bit old unfortunately.

seem like it corrupted very hard now

I’ve read through this similar forum thread about VMs that disappeared. The user’s VMs’ logical volumes were somehow deactivated and thus wouldn’t boot, though dom0 was fully accessible.

I seem to have a similar issue.

Output of lvs:

LV               VG         Attr       LSize   Pool      Origin Data% Meta% Move Log Cpy%Sync Convert
root             qubes_dom0 Vwi-aotz-- 20.00g  root-pool        49.85 
root-pool        .          twi-aotz-- 20.00g                   49.85 29.62
swap             .          -wi-ao---- 3.75g
vm-1-private     .          Vwi---tz-- 2.00g   vm-pool   ...
vm-1-root        .          Vwi---tz-- 10.00g  vm-pool   ...
vm-1-date-back   .          Vwi---tz-- 10.00g  vm-pool
vm-2-private     .          Vwi---tz-- 10.00g  vm-pool   ...
...
vm-pool          .          twi---tz-- 339.55g 
...
vm-n-date-back   .          Vwi---tz-- 2.00g   vm-pool

Output of pvs:

PV                   VG         Fmt  Attr PSize   PFree
/dev/mapper/luks-... qubes_dom0 lvm2 a--  455.24g 90.84g

Output of vgs:

VG         #PV #LV #SN Attr   VSize   VFree
qubes_dom0   1 216   0 wz--n- 455.25g 90.84g

Output of vgscan:

Found volume group "qubes_dom0" using metadata type lvm2

Output of lvscan --all:

inactive    '/dev/qubes_dom0/vm-pool' [339.55 GiB] inherit
ACTIVE      '/dev/qubes_dom0/root-pool' [20.00 GiB] inherit
ACTIVE      '/dev/qubes_dom0/root' [20.00 GiB] inherit
ACTIVE      '/dev/qubes_dom0/swap' [3.75 GiB] inherit
inactive    '/dev/qubes_dom0/vm-...-{private,private-snap,date-back,root,root-snap,volatile,}'  [XX.00GiB] inherit
...
inactive    '/dev/qubes_dom0/vm-...-{private,private-snap,date-back,root,root-snap,volatile,}'  [XX.00GiB] inherit
ACTIVE      '/dev/qubes_dom0/root-pool_tmeta' [24.00 MiB] inherit
ACTIVE      '/dev/qubes_dom0/root-pool_tdata' [20.00 GiB] inherit

Seeing as my qubes’ thin volumes were inactive, and seemingly not missing (as they were in the linked thread), I tried to activate a private volume using the command given in the thread.

Output of lvchange -a y /dev/qubes_dom0/vm-1-private:

Thin pool qubes_dom0-vm--pool-tpool (253:9) transaction_id is 39358, while expected 39360.

I made a backup of my volume group configs using vgcfgbackup and examined the file for transaction IDs. Interestingly, the transaction id for the vm-pool logical volume is 39358, and the last transaction id to apparenty occur for a VM (succeeded by transaction IDs for lvol0_pmspare, vm-pool_tmeta, vm-pool_tdata, root-pool_tmeta, and root-pool_tdata) is 39357

...
qubes_dom0 {
        id ...
        status = ["RESIZEABLE", "READ", "WRITE"]
        flags = []
        
        physical_volumes {

                pv0 { ...
                }
        }

        logical_volumes {

                vm-pool {
                        id ...
                        status = ["READ", "WRITE", "VISIBLE"]
                        flags = []
                        creation_time ...
                        creation_host ...
                        segment_count = 1
                        
                        segment1 {
                                 ...
                                 type = "thin-pool"
                                 metadata = "thin-pool_tmeta"
                                 pool = "vm-pool_tdata"
      # Look here                transaction_id = 39358
                                 ... }
                         }

                ...

                vm-1-private-snap {
                        id ...
                        status = ["READ", "WRITE", "VISIBLE"]
                        flags = []
                        creation_time ...
                        creation_host ...
                        segment_count = 1
                        
                        segment1 {
                                 ...
                                 type = "thin"
                                 thin_pool = "vm-pool"
      # Look here                transaction_id = 39357
                                 ... }
                         }
        }

Maybe something is wrong with the order of my transaction IDs? As noted above, lvchange expected an ID of 39360, but the metadata backup only goes as high as 39358 (in the 39xxx range).


I should add that I tried to add some space for metadata before editing my lvm.conf by following the instructions linked in this forum comment. Based on my ~/.bash_history, I ran these commands:

swapoff -a
lvresize -L -200M qubes_dom0/swap
swapon -a
swapoff -a
swapon -a
mkswap /dev/qubes_dom0/swap
swapon -a
lvextend --poolmetadatasize +200M qubes_dom0/vm-pool
1 Like

I have managed to edit and restore the metadata for vm-pool and change its transaction ID from 39358 to 39360.

vgcfgbackup qubes_dom0
# Then edit the backup with vim and replace transaction ID for the vm-pool logical volume
# Restore the new metadata
vgcfgrestore qubes_dom0 --file /etc/lvm/backup/qubes_dom0
# Need --force
vgcfgrestore qubes_dom0 --file /etc/lvm/backup/qubes_dom0 --force
# Need to deactivate vm-pool_tmeta and vm-pool_tdata before activating vm-pool
lvchange -a n qubes_dom0/vm-pool_tmeta
lvchange -a n qubes_dom0/vm-pool_tdata
# Finally activate vm-pool
lvchange -a y qubes_dom0/vm-pool

Unfortunately, the activation of vm-pool gave an error:

Thin pool qubes_dom0-vm--pool-tpool (253:9) transaction_id is 39358, while expected 39360. 

And vm-pool is still deactivated as per lvscan.

But I just changed the ID of vm-pool to
39360??? How are vm-pool and qubes_dom0-vm--pool-tpool different? Where can I find qubes_dom0-vm--pool-tpool in dom0?