Recently backing up Qubes has been failing with a message:
Whoops. A critical error has occurred. This is most likely a bug in Qubes Global Setting application.
StoragePoolException: file pool cannot export dirty volumes at line 102 of file base.py
I am running Qubes 4.0, regularly updated. I haven’t been able to find anything that looks relevant in the logs. I can’t find anything that looks wrong in the global settings database, either from the CLI or GUI. I have tried rebooting numerous times, starting the backup immediately after a boot or just before shutdown. I saw a different “StoragePoolException” once, though I didn’t write it down and haven’t reproduced it yet.
Any suggestions on how to track this down and fix it?
thanks!
– scott henry
“dirty volumes” sounds like one of your qubes was not properly shutdown and perhaps in such a case it is not possible to back it up. Try starting and shutting down all qubes that you want to back up.
I was finally able to reproduce the other error: “file pool cannot export running volumes”, which makes sense, since several of the VMs were running.
I have started and shutdown the VMs several times each since the error first appeared. I suspect that I haven’t started ALL of the VMs, I’ll try that.
How does Qubes determine that a volume is “dirty” when it is shut down?
qvm-ls --raw-list
will give you a list of all VMs if you want to write a script.
This in itself should not be a problem I believe. You would just end up backing up the version of the VM before it was started.
1 Like
I’ve tried everything suggested several times (using “qvm-start $vm; sleep 10; qvm-shutdown $vm”), including the obsolete VMs I haven’t gotten around to deleting yet. No change. I found a somewhat similar topic (Fedora Template cannot be cloned) and the issue that it referenced (Template cloning in R4.1 is significantly slower than in R4.0 · Issue #6329 · QubesOS/qubes-issues · GitHub). My installation is using the “file” pool. I can’t see anything wrong with my setup, based upon those threads.
I installed the system on 2018-04-21 and have been diligent about updating it.I am now fedora-33 and debian-10 as template VMs (and archlinux). I cannot find any logging about which VM(s) it is complaining about. I have started the process of excluding individual VMs from the backup, with no luck so far. I have also tried reinstalling any dom0 qubes-* package which looked like it may be relevant.
I haven’t gotten a successful backup since 2021-01-01, and I have updated a lot since then.
How can I find out which VMs/files/etc are considered to be “dirty”?
That sucks.
Further debugging ideas if you want to keep trying to fix this:
-
Does this happen for all qubes? Perhaps only one of them is corrupted? What if you try to back up a single random qube? Does it break? If not, you can you loop over them trying to back them up one by one to see which ones are broken.
-
Since the source of the problem seems to be base.py
(is it /usr/lib/python3.5/site-packages/qubesadmin/base.py
?), you could try to make a copy of this file and start debugging what exactly is going wrong there. For instance, you could output additional information to a fixed file.
Try this command in dom0 - any results with first column > 0?
find /var/lib/qubes -type f -name '*-cow.img' -printf '%9kK - %P\n'
Appears a volume in a file pool is dirty if there exists a private-cow.img
with allocated blocks. Use ls -s
or stat
to see allocated size.
From quick tests, once a qube using a file pool is started, Qubes OS creates a LVM snapshot, private-cow.img
. As updates occur to private.img
, private-cow.img
also grows.
Once the qube is shutdown, private-cow.img
is renamed to private-cow.img.old
and private-cow.img
is then zeroed out. The presence of a non-zeroed out private-cow.img
indicates a snapshot that did not get backed-up.
My guess is there is a non-zeroed private-cow.img
for one of the qubes causing the backup error.
Assuming the qube starts fine and private.img
contains what is expected, private-cow.img
can probably be safely deleted or renamed.
It does not appear qvm-volume revert
is supported on volumes in a file pool, so I am not exactly sure how to reapply a snapshot, something typically done with lvconvert --merge
.
In any event see if you have non-zeroed private-cow.img files and move them out of the way.
Good idea to try to backup one VM at a time. So far, 3 template VMs are failing and 3 appvms are backing up successfully (I’m doing them alphabetically). One of the templates had a running VM, the other didn’t. Running volumes just fail to backup (ie: the “running volumes” error seems to be fatal).
The 3 template VMs that have failed do not have a non-zero private-cow.img (that sure sounded like a valid possibility!)
1 template VM is backing up successfully. What’s different?
Hmmm, the ones failing have a non-empty root-cow.img
ok, seems to be (mostly) solved: a non-empty root-cow.img also prevents backup. So if “qvm-start $vm; sleep; qvm-shutdown $vm” results in a zero-sized root-cow.img, then the backup succeeds.
WooHoo! got a successful backup last night!
So the remaining question is how come a shutdown sometimes leaves a non-empty *-cow.img file?
And can the backup (and other apps) leave some breadcrumbs about the failure???
thanks for everybody’s help!
– scott