Cannot reclaim storage space from HVM

The sum of disk usages of all VM in “Qubes Manager” is usually similar to the total disk usage reported by the “Total disk usage” applet. The applet is the hard drive icon at right side of the menu bar of dom0.

However, after creating and deleting a bunch of HVM templates and HVM AppVMs (all Windows 10), the applet’s disk usage shows an extra 8GB, compared to the number from “Qubes Manager”

How to reclaim that 8GB storage?

I am using lvm + ext 4 instead of btrfs + reflink.

I have tried “sudo /sbin/fstrim /” at dom0 terminal. But that didn’t help.

result of `sudo lvs` after shutting down all VM
LV LSize (g) Origin Data% Meta% Lsize x Data% (g) Cumulative Sum
pool00 179.44 17.03 17.89 30.56 (exclude)
root 179.44 4.24 7.61 (exclude)
swap 5.7 0.00 (exclude)
vm-[AppVM A]-private 2 vm-[AppVM A]-private-[time A]-back 28.59 0.57 0.57
vm-[AppVM A]-private-[time A]-back 2 28.57 0.57 1.14
vm-[AppVM B]-private 8 vm-[AppVM B]-private-[time B]-back 44.14 3.53 4.67
vm-[AppVM B]-private-[time B]-back 8 18.36 1.47 6.14
vm-anon-whonix-private 2 0 0.00 6.14
vm-default-mgmt-dvm-private 2 0 0.00 6.14
vm-[Template S]-private 2 4.8 0.10 6.24
vm-[Template S]-root 10 vm-[Template S]-root-[Time S]-back 19.29 1.93 8.17
vm-[Template S]-root-[Time S]-back 10 19.28 1.93 10.10
vm-[Template R]-private 2 4.82 0.10 10.19
vm-[Template R]-root 10 vm-[Template R]-root-[Time R]-back 43.92 4.39 14.58
vm-[Template R]-root-[Time R]-back 10 43.93 4.39 18.98
vm-fedora-33-dvm-private 2 0 0.00 18.98
vm-fedora-33-dvm-private-[Time X]-back 2 0 0.00 18.98
vm-[AppVM C]-private 8 vm-[AppVM C]-private-[Time C]-back 74.84 5.99 24.96
vm-[AppVM C]-private-[Time C]-back 8 0 0.00 24.96
vm-sys-firewall-private 2 vm-sys-firewall-private-[Time 1]-back 4.79 0.10 25.06
vm-sys-firewall-private-[Time 1]-back 2 4.79 0.10 25.16
vm-sys-net-private 2 vm-sys-net-private-[Time 2]-back 4.82 0.10 25.25
vm-sys-net-private-[Time 2]-back 2 4.82 0.10 25.35
vm-sys-usb-private 2 vm-sys-usb-private-[Time 3]-back 4.8 0.10 25.45
vm-sys-usb-private-[Time 3]-back 2 4.79 0.10 25.54
vm-sys-whonix-private 2 vm-sys-whonix-private-[Time 4]-back 5.32 0.11 25.65
vm-sys-whonix-private-[Time 4]-back 2 5.33 0.11 25.75
vm-whonix-gw-15-private 2 4.79 0.10 25.85
vm-whonix-gw-15-root 10 vm-whonix-gw-15-root-[Time 5]-back 20.06 2.01 27.86
vm-whonix-gw-15-root-[Time 5]-back 10 20.05 2.01 29.86
vm-whonix-ws-15-dvm-private 2 0 0.00 29.86
vm-whonix-ws-15-private 2 4.8 0.10 29.96
vm-whonix-ws-15-root 10 vm-whonix-ws-15-root-[Time 6]-back 31.09 3.11 33.07
vm-whonix-ws-15-root-[Time 6]-back 10 31.08 3.11 36.17

Since the g unit of lvs mean GiB (man page), the cumulative sum that I calculate is 36 GiB, instead of the 30.8GiB reported by the applet.

Table in "Qube Manager" after shutting down all VM
Name Template Disk Usage(MiB) Disk Usage / 1024 Cumulative Sum (GiB)
dom0 AdminVM n/a
anon-whonix whonix-ws-15 0 0.00 0.00
default-mgmt-dvm [Template S] 0 0.00 0.00
[Template S] TemplateVM 2073.6 2.03 2.03
[Template R] TemplateVM 4596.12 4.49 6.51
fedora-33-dvm [Template S] 6130.89 5.99 12.50
[AppVM A] [Template R] 582.52 0.57 13.07
sys-firewall [Template S] 98.1 0.10 13.17
sys-net [Template S] 98.71 0.10 13.26
sys-usb [Template S] 98.3 0.10 13.36
sys-whonix whonix-gw-15 108.95 0.11 13.46
whonix-gw-15 TemplateVM 2152.24 2.10 15.57
whonix-ws-15 TemplateVM 3281.92 3.21 18.77
whonix-ws-15-dvm Whonix-ws-15 0 0.00 18.77
[AppVM B] [Template R] 3615.95 3.53 22.30

The total in Qube Manager is 22.3 GiB, which is 8.5 GiB less than the number reported by the applet.

How do I get those 8.5 GiB back?

Before I created and deleted those Windows 10 HVMs, the Qube Manager and the applet both said I used around 22 GiB.

The applet says:

Total disk usage
    [==----------]     17.2%
Volumes
    linux kernel
    *lvm*              17.2%      30.8 GiB / 179.4 GiB
    varlibqubes
  • What is the reason behind this lost of storage space?
  • How did the “total disk usage” applet get the number?

thin_check

Maybe this is a corruption of the meta-data of lvm

log out & run thin_check
  1. click log-out in the application menu of xfce.
  2. switch to a virtual console of dom0 with alt-ctrl-f2
  3. log in
  4. ‘sudo umount /dev/sda1’
  5. ‘sudo thin_check /dev/sda1’

thin_check says something about bad superblock.

Does this mean the metadata is corrupted?

Try to repair metadata

I want to repair the lvm metadata with lvconvert --repair /dev/qubes_dom0. That requires deactivating the volume group qubes_dom0.

Unfortunately, this volume group contains the dom0 linux and swap in additional to the volume for the virtual machine. That means I have to deactivate the dom0 linux before running the repair command in dom0 linux, which is impossible.

Questions

  • If I made a partitoin for dom0 and another partition for a lvm that holds all VMs during installation of Qubes OS, can I use dom0 to repair the storage pool for the VM?
  • If I use zfs or btrfs, would repairing the meta-data for one VM in a pool require deactivating everything in the pool?

@Qurious thanks for documenting here your thought process and attempts. I hope someone else is able to chip in with some knowledge, but regardless, great work here! And good luck on the issue resolution! Keeps ups posted :slight_smile:

Can’t you do this offline? By booting from QubesOS installation medium or whatever live linux distro really?

  • Write qubes iso to usb and then boot the usb.
  • At the select language screen of the installer, press alt-ctrl-f2 to go to a virtual console.
cryptsetup luksOpen /dev/sda2 decrypted
thin_check /dev/mapper/decrypted
lsblk
# deactive the volume group
vgchange -a n -v /dev/mapper/qubes_dom0
# repair
lvconvert --repair /dev/mapper/qubes_dom0-pool00

The lvconvert command complains:

Using default stripesize 64.00 KiB
WARNING: Sum of all thin volume sizes (333.44 GiB) exceeds the size of thin pools and the size of whole volume group (231.88 GiB)!
For thin pool auto extension activation/thin_pool_autoextend_threshold should be below 100.
WARNING: recovery of pools without pool metadata sparse LV is not automated
WARNING: If everything works, remove qubes_dom0/pool00_meta1 volume
WARNING: Use pvmove command to move qubes_dom0/pool00_tmeta on the best fitting PV
  • The Sum of all thin volume sizes... warning probably is fine because Qubes OS over-provisions.
  • What are the qubes_dom0/pool00_meta0 qubes_dom0/pool00_meta1 volume? Are they backup of the old metadata? I think I can remove them with lvremove /dev/mapper/qubes_dom0-pool00_meta0 when the volume group at /dev/mapper/qubes_dom0 is still active.
  • I think qubes_dom0/pool00_tmeta is already on the best fitting PV. The computer only have 1 physical volume (the one SSD).

thin_check /dev/mapper/decrypted says bad checksum

examining superblock
  superblock is corrupted
    bad checksum in superblock, wanted 2773457314
  • Is this normal?

I can’t run thin_check on the qubes_dom0-pool00_tmeta.

A post in the qubes user mail list described running thin_check on the metadata.

thin_check can’t check live metadata. Deactivate the pool would make the /dev/mapper/qubes_dom0-pool00_tmeta disappear.

Erh. Surely someone has been in this situation, is there an existing guide for repairing metadata of lvm for Qubes OS?

After trying to repair the metadata, the sum of disk usage of all VMs in “qube manager” / 1024 and the “total disk usage” applet still differ by 7.03 GiB

The “total disk usage” applet’s number is from sudo lvs qubes_dom0/pool00 if I understand issue #4039 correctly.