Adding some:
I have 16 qubes, one 8TB HDD backup disk
If I select all qubes, it fails, if I selct all but a few “exotics” qubes (like OpenBSD) it fails (randomly between 42 to 99%)
If I select half the qubes, same
If I select only the 6 Debian based qubes, it works (100% and option to close)
if I selct only the 5 Whonix based qubes, it works, 100%
If I selct only the 4 Fedora based qubes (exept dom0) it fails
If I select all 6 Debian + 5 Whonix, it fails … so it’s not about the qubes, but about the quantity or the interaction between them ?
I’m reinstalling from scratch and will start with backup “fresh”
Fresh install (see HCL: P15 gen2 3NVMe)
All default except 1: “backup-vm” (based on Debian templ.),
8TB HDD USB linked,
All VM off, except sys-usb, back-vm and of course dom0
Backup went to 98% …
-=-=-=-=-=-=-=-=-=-=-
Second try, all (standard) VM except dom0 and backup-vm:
Backup went to 99% … then 100% after 2min
backup-restore checking = All good
-=-=-=-=-=-=-=-=-=-=-
Another fresh install, backup-12-vm the first and only VM added to the standard ones, backup before any update, anything (except display change, and touchpad)
Backup stuck at 98% (All VM but backup)
What would be the cmd to check (in dom0 or backup-12-vm) what is going on, what is wrong, etc … ?
Anything to report so someone can help finding out ?
The underlying error for backups that are silently hanging can often (maybe even always?) be revealed by setting the backup destination to dom0 instead of a VM:
Make sure that your destination path in dom0 has enough free space for the set of problematic VMs.
TY !
But that means I have to attach my USB HDD to dom0, isn’T that eactly what we are told not to do ?
BAckup is still running, still hang at 98%
In /var/log/qubes, I see:
nano guid.backup-12-xfce.log.old
icon size: 128x128
X connection to :0.0 broken (explicit kill or server shuttdown)
nano guid.backup-12-xfce.log
icon size: 128x128
Window 0x4a00254b is still set as transient_for for a 0x4a002b6 window, but VM tried to destroy it
Window 0x4a002f2 is still set as transient_for for a 0x4a002f3 window, but VM tried to destroy it
Not necessarily: You might be able to narrow it down to one relatively small VM that always hangs, so the backup file can fit in your regular dom0 filesystem. Or plug in a second NVMe/SATA drive. Or allocate lots of RAM to dom0 and use it to create a tmpfs. It’s also not an issue for the Btrfs or XFS installation layouts, where a large dom0 filesystem is created (instead of a small one like for the LVM Thin layout).
Yes, thank you … I will re-do the entire one-by-one check.
What I’ve learned (from reinstalling from scratch) is that the dvm-backup fails even with totally empty, new OotB VMs as no file were written to any.
Also, I can’t attach a USB to dom0 anyway :-p
And I’ve learned from the ONE time the backup worked, and tried the restore, that it ADD VM to the current, it doesn’t replace them in-place, so any VM can be backed up and restored individually … // EDIT: I haven’t tried, but I guess that means I should be able to reinstall with option “Don’t install any predefined VM” and then restore them from the BU ? // so I’ll give it another try in an organized way, like template-backup; trusted VM backup, sys-vm-backup, … and use grob (@Solene) to automatize-ish that
I already have three, occupying all three slots
And then move it from tmpfs to the actuall backup location ?
my partitioning is all EXT4 (except /tmp and /var/tmp ETX2)
Should I reinstall using BTRFS (or XFS like my BU partition) ?
No, this is all just to diagnose the problem by provoking a visible error during the backup run. It’s not intended as part of a real backup procedure.
If you’re actually storing your VMs in a (legacy ‘file’ driver) pool on ext4, that’s not good. (You can check with qvm-pool list and qvm-volume list.) The legacy ‘file’ driver should not be used anymore, and it can be the cause of backup problems. If you prefer file-based storage to LVM Thin storage, use Btrfs or XFS to get the modern ‘file-reflink’ driver.
Better not use LVM at all if you want file-based storage.
If possible I’d even avoid manual partitioning altogether and do it with the automatic Btrfs installation layout (but with “RAID Level: Single” instead of RAID-0, in order to use their full capacity):
If you’re testing backing up to dom0, ensure that you have the same VMs running like in a normal backup run. (A typical cause for a backup failing/hanging with the legacy ‘file’ driver is when some of the VMs selected for inclusion in the backup file are running.)
I’m confused (for a change)
I’m not looking to have file-based or LVM or anything, I just want to have a robust stable Qubes running, and using RAID1 in case I have one that fails.
Each is 2TB, so I don’t need more room (for now), and RAID0 would defeat the purpose.
Qubes doesn’t see my bios RAID1, so I’ve been told to use soft.RAID1 instead.
Automatic partition is a no-go as it doesn’t ask for user requirement, and go straight to full disk, therefore I wouldn’t have my sysrun in the “cheap” one (the 256GB) and the system in mirror.
I’ve spent 3 years on that setup (don’t laugh, on a very dilettante way !)
PS: I’m just waiting for Qubes-OS 7.0 native ZFS
Oh right, you wrote RAID-1 not RAID-0. Though I think the installer can create that too for the automatic LVM Thin or Btrfs installation layouts?
With manual partitioning, unfortunately it’s easy to end up with a broken Qubes OS system, as you’ve noticed. My recommandation to anyone really is simplify, simplify, simplify your storage needs until they are simple enough to use one of the supported automatic installation layouts (LVM Thin, Btrfs, or XFS).
Yes, the new installer is quite good actually, all things considered. Not as good as Suse (NB: Biased judgement, I’ve used Suse for decades)
I’m trying my best to keep it as simple as possible, while still getting what I want out of it (EXT4 was what I thought to be the most basic), and keeping some for later once I get used to it and can start tweaking rather than messing up. (I.e: moving to ZFS, replacing templates with minimal templates, etc …)
I’m reinstalling,
80GB RAID1 BTRFS
400GB RAID1 BTRFS
1st try, didn’t work, giving it another try.
I just ran into this problem after setting some appvm to preload disposables. I turned it off an the problem was gone. I didn’t bother to look into exactly where the problem occurred. I’m guessing it’s related to appvms that use and preload networking vms like sys-net and sys-vpn.