My Qubes Backup procedure has been stuck at 99% for hours. This is my second attempt.
I started a backup early last night, it got to 99% after a few hours. I let it run overnight. I woke up and checked, it was still at 99%, so I canceled it and began a new backup. Same result, after a few hours, it got to 99%, and now it is stuck there.
Any additional information I should provide to help diagnose? The Qubes Backup dialog does not provide any info, just the progress bar.
A 4 TB external disk attached to my vault VM. It has 2.3 TB of free space. I have been using the same destination for over 20 consecutive backups (the backup info is saved in dom0).
Could it be that it is done, but that the backup dialog is not properly updated to “finished”? The backup file size looks about right.
If I click Cancel now, the backup file remains. If I do so, then validate the backup file works (i.e., test restoring it without actually restoring it, which I always do), should I have any cause for concern?
At the very least, looks like I’ve run into some bug with updating the backup dialog… And at worst, my backups are totally broken.
As the issue seems reproducible, you could try to use the command line program qvm-backup if it’s in your skills. This may have more debug information…
Otherwise, you could try to figure if a qube is making the issue, try a new backup with half the qubes, if it works, try the other half, if it’s stuck again, halve that half etc… until you pinpoint which qube is provoking the issue.
This has happened to me before,at first I thought of “sys-usb”. I narrowed it down to the external USB device. Once I replace the external USB all was good.
dom0 seems to be the culprit. I can complete a backup to 100%, and “finish”, if I remove dom0.
Also, running a backup just with dom0 VM selected gets hung as well. In this case, it actually gets hung at 49%.
I’ve tried using qvm-backup to get more verbose logs, but this fails and I cannot figure out why:
[user@dom0 ~]$ qvm-backup -v --profile /etc/qubes/backup/qubes-manager-backup.conf
Backup preparation error: Got empty response from qubesd. See journalctl in dom0 for details.
The journalctl logs tell me:
Apr 27 10:06:44 dom0 qubesd[2390]: permission denied for call b'admin.backup.Info'+b'/etc/qubes/backup/qubes-manager-backup.conf' (b'dom0' → b'dom0') with payload of 0 bytes
I’ve tried also to run it with sudo, but get the same error.
I do not see any I/O errors in dmesg logs of sys-usb nor my vault VM, which is the destination for the backup file. Although there are a lot of logs. Errors should have red font face, right? I see some other errors, but they don’t seem to be related. For example, I see this error:
[ 241.090426] scsi 0:0:0:1: Wrong diagnostic page; asked for 1 got 8
[ 241.090449] scsi 0:0:0:1: Failed to get diagnostic page 0x1
[ 241.090463] scsi 0:0:0:1: Failed to bind enclosure -19
Also this error:
[ 214.722513] vhci_hcd: vhci_device speed not set
And some other error that don’t seem to mention I/O or have to do with I/O.
Unfortunately, the verbose output does not produce any insights…
Do you want to proceed? [y/N] y
2024-04-27 19:48:29,503 [MainProcess selector_events.__init__:59] asyncio: Using selector: EpollSelector
Making a backup... 44.15%
Then it gets hung here. Just to clarify, now I am only attempting to do a backup of dom0.
It might be that some file is triggering a bug in the backup code… could you try shuffling files out of your home directory temporarily to see if any of them are causing the problem? Not sure how many files you have in there, but you could move half of them out, try the backup, swap the halves, and see if you get similar behavior both times. The fact that the percentages are different each time is odd, but I’m not sure if the backup order is deterministic (I’m aware that some OS-level operations return directory contents in non-deterministic orders, but I’m assuming that the backup tool is written in Python so it might be getting a sorted list… IDK).
Turned out there was a file that was triggering the failed/stalled backup. It was actually the .xsession-errors file in the home directory, which was growing due to a buggy i3 script. That file was actually increasing regularly.
Figures, for regular VMs, if they are running, the backup takes a snapshot of the state from before it started (if I’m not mistaken). But I believe this is not the case with dom0, which is obviously a different kind of VM. In my case, the state of files in home on dom0 was actually changing even as the backup was running, so the backup was unable to complete.
Thanks everyone for the help!
This also helped me chase down a bug with an i3 script.