Vault VM not starting, "Domain vault has failed to start: volume qubes_dom0/vm-vault-private missing", lost 300GB data.

Hi everyone, I think I’ve broken my vault VM and lost all data 300GB :frowning: and don’t know what to do now, I was trying to find some solution, but couldn’t find anything. Please help me restore my data.

Scenario of the issue:

  • I got “Out of disk space” error (yellow triangle warning saying 5% space left)
  • so I started deleting the biggest files from vault, but it didn’t help
  • I started deleting backups by using “lvremove qubes_dom0/vm-name-back”. I am sure I haven’t deleted vm-vault-private, because don’t see that in command history, I just once run command “sudo lvremove qubes_dom0/vm” without name, but nothing happened, so I think it’s not the reason. This process helped me to restore some data space.
  • I was working a few days without any issue, putting system to hibernate only at night
  • I created new vm-test and deleted it, still all fine, system works, only one info was suspicious, under “Qubes Devices” icon I could attach USB device into this old vm-test, but I ignored that thinking it will disappear after system restart
  • I was putting system to hibernate at night, all fine after reboot, until today when “bottom panel” has frozen and I couldn’t change workspace, so I was forced to restart system as usual when it happens
  • the system was not able to restart itself, there was black screen at the end of restart process, when I clicked first time on-off button nothing happened so I had to click it a few times to reboot
  • now I am trying to start vault-vm but I can’t because of “Domain vault has failed to start: volume qubes_dom0/vm-vault-private missing”
  • for all this days I was working with “Out of disk space” error (yellow triangle warning saying less than 5% space left), but I didn’t know how to fix it, because it always increases when I delete some backups and files

Now, when I run “sudo lvs” I can’t find “vm-vault-private”, there is only “vm-vault-private-snap” .

  1. Seems “vm-vault-private” has disappeared, is it possible to restore it somehow?
  2. does “vm-vault-private-snap” snapshot have good data? or it can be broken data?
  3. what is best to do now? is it possible to restore whole “vm-vault-private” from “vm-vault-private-snap”? or I have to create new VM and start restoring data from snapshot to this new VM?
  4. why did it happen? I’ve never had this issue before with missing vm, but I’d like to avoid it in future
  5. how to fix this “Out of disk space” error eventually?

please help me
thank you in advance!

You can try this:

But it’s better to make a clone of your disk first before doing anything so you can have a backup in case you mess something up.

I am not sure vm is deleted, it was working before restart, now when I run “qvm-volume info vault:private” command I can even get info about it like:

pool vlm
vid qubes_dom0/vm-vault-private
rw True
source
save_on_stop True
snap_on_start False
size 315213432122
usage 0
revisions_to_keep 1
is_outdated False
Available revisions (for revert): none

It could be that qvm-volume don’t expect for user to remove the volume manually using e.g. lvremove without using Qubes OS tools so it doesn’t check if the volume actually exists.

Thanks for replay apparatus!
I only deleted backups by using “lvremove qubes_dom0/vm-name-back”. I am sure I haven’t deleted vm-vault-private or any other VM, I check that and there is no such a command in history.

I tried to fix it from outside. Started live USB linux, decrypted disk, and run commands:

sudo thin_check /dev/mapper/

it resulted: Device or resource busy (OS error 16)
I don’t know why?

next I did:

sudo lvchange -a n -v /dev/mapper/qubes_dom0
sudo lvconvert --repair /dev/mapper/qubes_dom0-pool00
it resulted:

WARNING: Sum of all thin volume sizes (4TiB) exceeds the size of thin pools and the size of the whole volume group (<900 GiB)
WARNING: You have not turned on protection against thin pools running out of space.
WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
WARNING: LV qubes_dom0/pool00_meta0 holds a backup of the unrepaired metadata. Use lvremove when no longer required.

so I removed all backups again (VM with “-back” ending) with lvremove.
and it didn’t help because now there is still 2TiB in all thin volume sizes. I don’t understand why.

I started QubeOs again, now I can’t run two more VMs (the same error msg “vm-name2-private missing”, “vm-name3-private missing”).
I’m sure I didn’t delete them by lvremove, I double check what I type on console, I don’t understand why that issue happens, it looks like lvremove deletes aditional VMs, why? Is SDD broken?

I restored 40% of disk space after this operation, I noticed that a lot of space is taken backups so I disabled checkbox for all VMs under “include in backups by default” in Qube Settings.
I think that’s why I got some issues with disk space, because of backups, so I hope it will not happen again. Is it safe for me to disable backups? snapshot helped me to restore data today, but snapshot is different than backup right and will be created when backup is disabled, right?

apparatus, I’ve read “How to recover logical volume deleted with lvremove”, seems it’s only way to restore data, what tool do you propose to clone disk before this operation? it can be dangerous and I don’t have trust, because of lvremove command :frowning:

btw. my sys-usb takes a lot GiB, because I increased space in the past, but now I am not able to decrease size, and decreasing isn’t recomended by documentation, it’s better to create one, but when I created new sys-usb my usb mouse didn’t work, don’t know why? :frowning:

I don’t know how LVM volumes are handled in Qubes OS so I can’t comment on this.

The “include in backups by default” in Qube Settings is not related to the -back volumes, it’s just a option for Qubes Backup tool to check or uncheck this qube for a backup by default.

You can just use dd to clone the entry disk if you have a separate disk large enough to store your disk image.

I guess it’s better to just remove the sys-usb and create a new one.