System Crash: No Boot Device - Crash happened after or during "Qubes Backup"

Hi, an hour ago I came back to my office and saw on my Laptop’s screen in big letters: “No Boot Device”. That means that during last night’s Qubes Backup the system crashed, but wouldn’t know why. Before I started the backup I did shutdown all Qubes except of sys-usb and dom0 of course. I use sys-usb to mount the external USB disk on which I do the backups. That never happened before in the last 3 months.

I think for some reason grub got corrupted. So, I started Tails and the partitions are all there. So what I think I have to do is to re-install grub. I found this document http://qubes-os.org/doc/mount-from-other-os/ which tells me how to mount the partition so that I could chroot into it. But since I’m not so accustomed with Xen I don’t know how to re-install grub.

Any help would be highly appreciated.

System in question is described in the below post, which is the HCL + support files I contributed a couple of days ago:

Disk-layout:

Blockquote
root@amnesia:~# lsblk -f
NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
loop0 squashfs 0 100% /lib/live/mount/rootfs/filesystem.squashfs
loop1 squashfs 0 100% /lib/live/mount/rootfs/4.10.squashfs
sda
└─sda1 crypto_LUKS 5f0e319c-69a8-4e05-b37b-02ad1d5e4857
sdb
├─sdb1 vfat TAILS 0DBD-4495 6.6G 18% /lib/live/mount/medium
└─sdb2 crypto_LUKS 3ada973a-c2c0-487a-ab91-4b4afde32a2c
└─TailsData_unlocked
ext4 TailsData b086637e-f95f-4a8c-8556-565955d03936 45.5G 1% /live/persistence/TailsData_unlocked
nvme0n1
├─nvme0n1p1 vfat DD8A-3C1A
├─nvme0n1p2 ext4 c48bdfce-3a74-43c8-996f-b3a7cd9021a2
└─nvme0n1p3 crypto_LUKS 66de607a-b7e1-4bde-a3db-a4f6d7100a8c

It must have happened during backup because the latest backup file is corrupt. I can’t untar it.

Could rebuild the fs-tree but the EFI rebuild didn’t help. Still no bootable device. I did a fresh install and try to restore.

I still haven’t an idea what could have caused such a system crash. There was only dom0 and sys-usb running. sys-usb to hold the backup medium and dom0 ran the backup. So, only read access on the disks. How can that corrupt the boot record?

Is it possible that you ran out of disk space (on the main drive on which Qubes is installed)?

I think that what happened. After digging into the source code I think that’s the most likely cause. I found that

python -m qubes.tarwriter

works in dom0 and hence writes there the collected data to the disk (/tmp). At a certain point qubes-backup splits the stream and scrypt starts to write the data to the backup medium, which in my case is a 4TB USB disk. That is quite cool because only encrypted data flows throw sys-usb and hence to the USB controller. But what I couldn’t find out how qubes-backup calculates the actually limit when to split the stream. In other words how much data are written in dom0 before the split occurs.

I also couldn’t find the source code for qubes.tarwriter. I found where it gets called, though:

https://dev.qubes-os.org/projects/core-admin/en/latest/_modules/qubes/backup.html#Backup

   file_stat = os.stat(path)
                if stat.S_ISBLK(file_stat.st_mode) or \
                        file_info.name != os.path.basename(path):
                    # tar doesn't handle content of block device, use our
                    # writer
                    # also use our tar writer when renaming file
                    assert not stat.S_ISDIR(file_stat.st_mode), \
                        "Renaming directories not supported"
                    tar_cmdline = ['python3', '-m', 'qubes.tarwriter',
                        '--override-name=%s' % (
                            os.path.join(file_info.subdir, os.path.basename(
                                file_info.name))),
                        path]

Do you know where the tarwriter source is hiding? :wink:

Found tarwriter:

https://github.com/QubesOS/qubes-core-admin/blob/master/qubes/tarwriter.py

6 posts were split to a new topic: Dev.qubes-os.org unavailable

I monitored the tmpfs and it turns out that qubes-backup writes 101MB and then sends it off to the backup medium, hence there couldn’t have been a space problem with the root fs (/). I keep digging…