Backup stuck at 99%

My Qubes Backup procedure has been stuck at 99% for hours. This is my second attempt.

I started a backup early last night, it got to 99% after a few hours. I let it run overnight. I woke up and checked, it was still at 99%, so I canceled it and began a new backup. Same result, after a few hours, it got to 99%, and now it is stuck there.

Any additional information I should provide to help diagnose? The Qubes Backup dialog does not provide any info, just the progress bar.

What is the backup destination?

A 4 TB external disk attached to my vault VM. It has 2.3 TB of free space. I have been using the same destination for over 20 consecutive backups (the backup info is saved in dom0).

In top, can you see some CPU usage?

Here are my top processes that seem to be doing something:

Tasks: 543 total,   3 running, 540 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.6 us,  4.9 sy,  0.0 ni, 88.3 id,  0.1 wa,  0.6 hi,  0.1 si,  0.5 st
MiB Mem :   3871.9 total,   2074.1 free,    886.1 used,    911.7 buff/cache
MiB Swap:   4028.0 total,   3412.0 free,    616.0 used.   2866.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                
   2202 root      20   0  785532 101720  10680 S  29.4   2.6   1381:58 qubesd                                                                                                                 
   1898 root      20   0 1432332   7348   5648 S  13.9   0.2 637:20.71 libvirtd                                                                                                               
  17628 root      20   0 3515608 332412  20336 R   6.6   8.4 772:15.12 Xorg                                                                                                                   
3367389 user      20   0    6692   2896   2732 R   1.7   0.1  43:06.24 bash                                                                                                                   
3367387 user      20   0   32024   5000   4220 S   1.3   0.1  36:50.63 i3bar                                                                                                                  
   2316 root      20   0    2696     96      0 S   0.7   0.0   3:28.17 meminfo-writer                                                                                                         
  17988 user      20   0 2188940   5432   4676 S   0.7   0.1  46:32.28 picom                                                                                                                  
     16 root      20   0       0      0      0 I   0.3   0.0   7:47.69 rcu_preempt                                                                                                            
   1868 root      20   0    7176   3580   2120 S   0.3   0.1  22:54.32 xenstored                                                                                                              
1289643 user      20   0   78976   2544   2488 S   0.3   0.1   0:02.03 qubes-guid                                                                                                             
1297161 root      20   0       0      0      0 S   0.3   0.0   0:03.16 99.xvdb-0                                                                                                              
2484151 user      20   0   78964   3832   3380 S   0.3   0.1   0:24.65 qubes-guid                                                                                                             
3367388 user      20   0    2496   1860   1640 S   0.3   0.0   5:26.18 i3blocks                                                                                                               
3505353 user      20   0    9752   4480   3272 R   0.3   0.1   0:00.20 top                                                                                                                    
      1 root      20   0  172916   7544   4896 S   0.0   0.2   0:16.64 systemd                                                                                                                
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.12 kthreadd                                                                                                               
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp  

It seems the backup tool isn’t doing anything, from what I read :confused:

Could it be that it is done, but that the backup dialog is not properly updated to “finished”? The backup file size looks about right.

If I click Cancel now, the backup file remains. If I do so, then validate the backup file works (i.e., test restoring it without actually restoring it, which I always do), should I have any cause for concern?

At the very least, looks like I’ve run into some bug with updating the backup dialog… And at worst, my backups are totally broken.

Whatever you do, please write the result of it here…
Good luck!

1 Like

As the issue seems reproducible, you could try to use the command line program qvm-backup if it’s in your skills. This may have more debug information…

Otherwise, you could try to figure if a qube is making the issue, try a new backup with half the qubes, if it works, try the other half, if it’s stuck again, halve that half etc… until you pinpoint which qube is provoking the issue.

1 Like

I am testing various methods and will update with results. Will probably take a day or so because of the time for backups and the trial-and-error.

This has happened to me before,at first I thought of “sys-usb”. I narrowed it down to the external USB device. Once I replace the external USB all was good.

1 Like

Check for I/O errors in the dmesg logs of (presumably) sys-usb, and in the backup destination VM too.

I wouldn’t rely on the results of verification mode if there were any problems during backup, like in your case.

dom0 seems to be the culprit. I can complete a backup to 100%, and “finish”, if I remove dom0.

Also, running a backup just with dom0 VM selected gets hung as well. In this case, it actually gets hung at 49%.

I’ve tried using qvm-backup to get more verbose logs, but this fails and I cannot figure out why:

[user@dom0 ~]$ qvm-backup -v --profile /etc/qubes/backup/qubes-manager-backup.conf 

Backup preparation error: Got empty response from qubesd. See journalctl in dom0 for details.

The journalctl logs tell me:

Apr 27 10:06:44 dom0 qubesd[2390]: permission denied for call b'admin.backup.Info'+b'/etc/qubes/backup/qubes-manager-backup.conf' (b'dom0' → b'dom0') with payload of 0 bytes

I’ve tried also to run it with sudo, but get the same error.

I do not see any I/O errors in dmesg logs of sys-usb nor my vault VM, which is the destination for the backup file. Although there are a lot of logs. Errors should have red font face, right? I see some other errors, but they don’t seem to be related. For example, I see this error:

[  241.090426] scsi 0:0:0:1: Wrong diagnostic page; asked for 1 got 8
[  241.090449] scsi 0:0:0:1: Failed to get diagnostic page 0x1
[  241.090463] scsi 0:0:0:1: Failed to bind enclosure -19

Also this error:

[  214.722513] vhci_hcd: vhci_device speed not set

And some other error that don’t seem to mention I/O or have to do with I/O.

That should be qvm-backup -v --profile qubes-manager-backup

Thank you for the correction!

Unfortunately, the verbose output does not produce any insights…

Do you want to proceed? [y/N] y
2024-04-27 19:48:29,503 [MainProcess selector_events.__init__:59] asyncio: Using selector: EpollSelector
Making a backup... 44.15%

Then it gets hung here. Just to clarify, now I am only attempting to do a backup of dom0.

Curiously, each time I attempt the dom0 backup, the progress is less and less.

At first, it maxed out at 49.9% (quite quickly, because dom0 is small). I let that run for hours while I was afk.

Then I tried again, it maxed out at 44.15%, quite quickly. I canceled it after an hour.

Then I tried again, it maxed out at 42.71%. I canceled it after a few minutes. Then again, down to 42.65%, then down to 42.48%.

The maximum progress is dropping with each attempt.

It might be that some file is triggering a bug in the backup code… could you try shuffling files out of your home directory temporarily to see if any of them are causing the problem? Not sure how many files you have in there, but you could move half of them out, try the backup, swap the halves, and see if you get similar behavior both times. The fact that the percentages are different each time is odd, but I’m not sure if the backup order is deterministic (I’m aware that some OS-level operations return directory contents in non-deterministic orders, but I’m assuming that the backup tool is written in Python so it might be getting a sorted list… IDK).

3 Likes

Turned out there was a file that was triggering the failed/stalled backup. It was actually the .xsession-errors file in the home directory, which was growing due to a buggy i3 script. That file was actually increasing regularly.

Figures, for regular VMs, if they are running, the backup takes a snapshot of the state from before it started (if I’m not mistaken). But I believe this is not the case with dom0, which is obviously a different kind of VM. In my case, the state of files in home on dom0 was actually changing even as the backup was running, so the backup was unable to complete.

Thanks everyone for the help!

This also helped me chase down a bug with an i3 script.

4 Likes

that’s an interesting bug, thanks for sharing your findings!