Critical error creating a backup

Recently backing up Qubes has been failing with a message:

Whoops. A critical error has occurred. This is most likely a bug in Qubes Global Setting application.
StoragePoolException: file pool cannot export dirty volumes at line 102 of file base.py

I am running Qubes 4.0, regularly updated. I haven’t been able to find anything that looks relevant in the logs. I can’t find anything that looks wrong in the global settings database, either from the CLI or GUI. I have tried rebooting numerous times, starting the backup immediately after a boot or just before shutdown. I saw a different “StoragePoolException” once, though I didn’t write it down and haven’t reproduced it yet.

Any suggestions on how to track this down and fix it?

thanks!

– scott henry

“dirty volumes” sounds like one of your qubes was not properly shutdown and perhaps in such a case it is not possible to back it up. Try starting and shutting down all qubes that you want to back up.

I was finally able to reproduce the other error: “file pool cannot export running volumes”, which makes sense, since several of the VMs were running.

I have started and shutdown the VMs several times each since the error first appeared. I suspect that I haven’t started ALL of the VMs, I’ll try that.

How does Qubes determine that a volume is “dirty” when it is shut down?

qvm-ls --raw-list will give you a list of all VMs if you want to write a script.

This in itself should not be a problem I believe. You would just end up backing up the version of the VM before it was started.

1 Like

I’ve tried everything suggested several times (using “qvm-start $vm; sleep 10; qvm-shutdown $vm”), including the obsolete VMs I haven’t gotten around to deleting yet. No change. I found a somewhat similar topic (Fedora Template cannot be cloned) and the issue that it referenced (Template cloning in R4.1 is significantly slower than in R4.0 · Issue #6329 · QubesOS/qubes-issues · GitHub). My installation is using the “file” pool. I can’t see anything wrong with my setup, based upon those threads.
I installed the system on 2018-04-21 and have been diligent about updating it.I am now fedora-33 and debian-10 as template VMs (and archlinux). I cannot find any logging about which VM(s) it is complaining about. I have started the process of excluding individual VMs from the backup, with no luck so far. I have also tried reinstalling any dom0 qubes-* package which looked like it may be relevant.
I haven’t gotten a successful backup since 2021-01-01, and I have updated a lot since then.
How can I find out which VMs/files/etc are considered to be “dirty”?

That sucks.

Further debugging ideas if you want to keep trying to fix this:

  1. Does this happen for all qubes? Perhaps only one of them is corrupted? What if you try to back up a single random qube? Does it break? If not, you can you loop over them trying to back them up one by one to see which ones are broken.

  2. Since the source of the problem seems to be base.py (is it /usr/lib/python3.5/site-packages/qubesadmin/base.py?), you could try to make a copy of this file and start debugging what exactly is going wrong there. For instance, you could output additional information to a fixed file.

Try this command in dom0 - any results with first column > 0?

find /var/lib/qubes -type f -name '*-cow.img' -printf '%9kK - %P\n'

Appears a volume in a file pool is dirty if there exists a private-cow.img with allocated blocks. Use ls -s or stat to see allocated size.

From quick tests, once a qube using a file pool is started, Qubes OS creates a LVM snapshot, private-cow.img. As updates occur to private.img, private-cow.img also grows.

Once the qube is shutdown, private-cow.img is renamed to private-cow.img.old and private-cow.img is then zeroed out. The presence of a non-zeroed out private-cow.img indicates a snapshot that did not get backed-up.

My guess is there is a non-zeroed private-cow.img for one of the qubes causing the backup error.

Assuming the qube starts fine and private.img contains what is expected, private-cow.img can probably be safely deleted or renamed.

It does not appear qvm-volume revert is supported on volumes in a file pool, so I am not exactly sure how to reapply a snapshot, something typically done with lvconvert --merge.

In any event see if you have non-zeroed private-cow.img files and move them out of the way.

Good idea to try to backup one VM at a time. So far, 3 template VMs are failing and 3 appvms are backing up successfully (I’m doing them alphabetically). One of the templates had a running VM, the other didn’t. Running volumes just fail to backup (ie: the “running volumes” error seems to be fatal).
The 3 template VMs that have failed do not have a non-zero private-cow.img (that sure sounded like a valid possibility!)
1 template VM is backing up successfully. What’s different?
Hmmm, the ones failing have a non-empty root-cow.img

ok, seems to be (mostly) solved: a non-empty root-cow.img also prevents backup. So if “qvm-start $vm; sleep; qvm-shutdown $vm” results in a zero-sized root-cow.img, then the backup succeeds.

WooHoo! got a successful backup last night!

So the remaining question is how come a shutdown sometimes leaves a non-empty *-cow.img file?

And can the backup (and other apps) leave some breadcrumbs about the failure???

thanks for everybody’s help!

– scott