Dom0 backup/snapshot?

Background:
I use Qubes4.1 as my daily driver so the cost of any disruption is very high (time=money).
I use the internal backup solution for all the VMs but since you don’t get a FULL dom0 backup, dread the day a problem in dom0 will force me to reinstall+restore as I recall that the OS installation on my rig was very long and complicated due to my rig being rather new with some unsupported hardware. (at least not right out of the box).
This also mean that I’m extremely reluctant to make ANY change to dom0 in fear of breaking anything, even if it means “swallowing” some ongoing issues just so I don’t take the risk of breaking my dom0.
I am probably not alone in this.

Solution required:
So I am looking for a solution that will allow me to revert changes in dom0.
I’m ok with something that I start manually just before I’m installing an update, or intend to make a change in dom0, with the option to roll-back to that point in time if something goes wrong.
I don’t understand to complexity and/or security implications of it but I can think of two main cases:
a. User did something in dom0 which didn’t solve the problem, and they want to roll-back and try something else
b. The OS is up but something is broken (bad update, or a user-made change)
c. The OS not coming up after reboot (anything from a driver issue to

Any suggestions for anything NOT requiring a full OS install “from scratch”?

*If you suggest any backup software that runs OUTSIDE Qubes, bare in mind that we’re looking to backup only dom0 to avoid huge and unnecessary waste of time and storage space.

:waiting for some brilliant ideas: :sunglasses:

3 Likes

I think you can create/restore LVM snapshot of dom0 root and backup EFI and boot partitions as described here:

Thanks.
As far as I can understand, they go way beyond what I needed but I don’t have the skills to filter out the relevant commands :confused:

I also found this: GitHub - tasket/wyng-backup: Fast Time Machine-like backups for logical volumes
but am not sure how to use it for dom0.
Also, I think it would be wise to save the dom0 backup to an external device, but how to get the backup there without attaching anything directly to dom0?

Snapshoting dom0 (root LVM) under QubesOS? · Issue #53 · tasket/wyng-backup · GitHub

URL Form Destination Type
internal:/path Local filesystem
ssh://user@example.com/path SSH server
qubes://vm-name/path Qubes virtual machine
qubes-ssh://vm-name:me@example.com/path SSH server via a Qubes VM

GitHub - tasket/wyng-backup: Fast Time Machine-like backups for logical volumes & disk images

Thanks. From reading what you linked regarding Snapshoting dom0 under Qubes I got the impression that doing so has the potential of de-stabilizing Qubes, so maybe it should be avoided.
I guess I’ll just have to wait for someone to make a simple/stupid user-guide, hopefully something official and approved by Devs, that will allow full backup/rollback of dom0.

I don’t know why the option isn’t provided as part of the OS in the first place, but remind myself that even though Qubes-OS became “my whole world” as far as my workstation goes, it is still maintained by a not-so-many Dev’s, who do great job and they must prioritize where to put the time.

There is no “stability” issues of doing a dom0 live snapshot. What was discussed under that issue is the fact that dom0 snapshop would be “unclean” since logs and other opened files for writing would be incoherent if restored as a dom0 LVM replacement. It is better, and encouraged, to do a snapshot at shutdown, which would be clean, at that time.

@redmind @tzwcfq
From Snapshoting dom0 (root LVM) under QubesOS? · Issue #53 · tasket/wyng-backup · GitHub

I also have added this to ‘/lib/systemd/system-shutdown’ to generate a root snapshot each shutdown:

#!/bin/sh

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

Note: Under Q4.1, the proper place to put this shutdown script is under /usr/lib/systemd/system-shutdown/root-autosnap.shutdown.

Creating snapshot at shutdown causes no instability/incoherence issues at all. I’m using it myself since a long while without issue for dom0, which I use to take wyng-backups if its root volume, permitting comparison of dom0 states if needed against actual dom0, and dom0 restoration. That is also what led to Secure boot support · Issue #4371 · QubesOS/qubes-issues · GitHub, which takes 2 snapshots to be able to compare states between the previous 2 shutdowns, combined with some glue explained under How to mount LVM images | Qubes OS to be able to mount those LVMs and inspect them under disposable qube.

Restoring a dom0 state is hackish as of now with wyng-backup, and requires to create a RW snapshot first, restoring into that volume (--sparse-write option is totally amazing, restoring only changes in destination RW LVM from origin snapshot) and then asks that volume to be merged on volume deactivation. In my tests, this works flawlessly, but that should be a complete separate write up probably under testing area, and not directed at inexperienced users as of now. The result of that PoC permits to restore a whole system to a known state (yep, wyng-backup has tags) under 5 minutes from within Qubes to be able to boot into a clean Qubes install + updates.

That idea was covered under Coping with OS-level snapshot rotation · Issue #88 · tasket/wyng-backup · GitHub. Note that @tasket doesn’t agree with the merging approach as of now, and explained there why. The best way of doing this would be, of course, to have Qubes implement such rotation equally, not differentiating dom0 from any other qube. Note here that the scripts I use are actually creating snapshots of dom0’s root from root-pool into the vm-pool, to not touch the relatively small dom0 pool, which would probably be filled if for some reason templates are passed to dom0 and doesn’t complete, that two reboots in a row or something. Let’s remember that snapshots have no cost until they change from their origin. But a clean dom0 being snapshot, then that snapshot containing templates to be deployed but failing, rebooting and failing again might fill the small root-pool. Having the snapshots created under vm-pool dodges that possible issue.

The real question here would be addressed to the devs for their input (@demi @marmarek @fepitre): why dom0 is not considered as all appvms LVMs and have 2 restore points by default? The additional needed logic to deal properly with merging (sudo lvconvert --merge) the changes on next LVM deactivation? Implementing the same logic existing for qubes to dom0 would permit qvm-volume revert dom0:root, could also be applied to dom0 root volume if -back volumes would be made available, and where live volumes (dom0 definitely being one, but same applying to qubes, while less interesting, just making sure the desired state will not be deleted at shutdown, which is the case here with volumes rotation. That would also be nice to revert a live qube to its -2 snapshot while it still exists) could leverage and deal properly with sudo lvconvert --merge, discussed under Coping with OS-level snapshot rotation · Issue #88 · tasket/wyng-backup · GitHub (and should maybe be discussed over there?)
Where qvm-volume could later on deal with the presence of -revert snapshots, and even permit dom0 to be fully backuped in current Qubes backup (where currently its only ~/home that is backuped), and also permit a “live restore” option, even including /boot partition.

Maybe we should have a distinct thread for wyng-backup to raise attention, testing and funding of @tasket amazing work, so that one day, it could replace/complement Qubes current backup/restore mechanism. Note that currently, wyng-backup only deals with LVMs currently, being the default Qubes partitioning scheme.

4 Likes

The short version, Read Only unapproved by devs version of above post, not related to external backups, would be to:

Content of /lib/systemd/system-shutdown/root-autosnap:

#!/bin/sh

#This permits wyng-backup to backup root-autosnap and root-autosnap-back, taken at each system shutdowns like any other QubesOS LVMs.

#We delete the backup of last shutdown snapshot (last last shutdown)
/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap-back || true
#We take a snapshot of root-autosnap into root-autosnap-back
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root-autosnap -n root-autosnap-back
#We remove root-autosnap
/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
#We create root-autosnap from root
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

sudo chmod +x /lib/systemd/system-shutdown/root-autosnap

I would not advise doing this on your daily driver, but you could test on a secondary install on another computer first.

So basically, the following would cover a and b above. But not c.

Before doing changes, you should reboot once, to have a dom0 clean state on halt.
Then in qubes calling of sudo /lib/systemd/system-shutdown/root-autosnap will create an an additional unclean state (autosnap-back being your state at halt). Do your change on dom0. Reboot if that needs testing.

You keep two states:

  • root-autosnap: last state.
  • root-autosnap-back: last last state.

At this point in time, calling sudo lvconvert --merge /dev/qubes_dom0/root-autosnap-back would revert dom0 to the state prior of your change, on reboot (when dom0 root volume will be deactivated). This is also why your ‘c’ case above would not work from qubes if it is not in a bootable state. But booting from QubesOS installation media in recovery mode would permit you to apply another state.

The reason we like automatic snapshots on shutdown is because it is a safer to not explode space consumptions (auto-cleanup), to make sure we do not keep too many states, which go exponential over time. This is why they need to be removed without being forgotten in a timely manner. So we rotate them in the shutdown script.

Otherwise, you could play manually with that concept, and even create named snapshots of your own to revert to later on:
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n dom0-root-NamedBackupPriorOfChange

But that backup won’t have any automated way to be cleaned automatically.
You would have to remove it when unneeded
sudo lvremove /dev/qubes_dom0/dom0-root-NamedBackupPriorOfChange

If you realize you need to revert to that unclean state upon reboot (logs corrupted once again):
sudo lvconvert --merge /dev/qubes_dom0/dom0-root-NamedBackupPriorOfChange

Once again, please test on a separate installation first, not on your daily driver.

Where wyng-backup picks it up here is that root-autosnap is the last snapshot taken.
Since wyng-backup backups LVMs, if wyng keeps track of root-autosnap, asking it to send a session (--dedup) would send only the incremental backup (changes) that occured since the last backup was made. This is why we love wyng, and why having dom0 rotation at shutdown as every other LVM volume would make sense, so that wyng just deals with it without the need of additional logic or wrappers.

I’ll just post a quick answer here for now: it’s on the backlog to be able to revert dom0 to older snapshot. That’s notably the motivation for separating the dom0 pool and logical volume. If I remember correctly, we plan to have some rescue entry in grub that would lead to ease dom0 reverting to a previous snapshot. We also may face some issue related to qubes.xml file that store all the information about settings and qubes. We may need to manipulate this file (potentially some others?) differently that simply reverting the whole system. You may “lost” several VMs and that we would need some recovery tool that would allow registering them. Notably, this part for registering VM is related to StorageVM project. I let @marmarek giving the details :slight_smile: .

2 Likes

The answer is simple: dom0 is what manages those pools. You can’t have this pool management before starting dom0 system, and you need to mount dom0’s root volume to start dom0 system. And also, you can’t perform various destructive operations on dom0 volume that is mounted - contrary to VMs, where you can shut them down and still have functioning dom0 to manage volumes from there.

As a more general comment: doing a dom0 rollback independently of other parts of the system is a terrible idea, and has a great potential to make the system unusable, or just broken in some subtle ways. For example we keep in dom0 information what is in other LVM volumes - if you rollback that, you may loose access to some of your VMs or have them overridden. This may change at some point, when we separate “data” from “system” (where the “system” part will be read-only), but it isn’t there yet. And even then, rolling back “data” part is a very bad idea (unless you rollback all the other volumes too). Until then, reverting dom0 at LVM level should be considered unsupported.

1 Like

@fepitre: this is what qubes-wyng-util meta-prep takes care of in such case, having a backup of qubes.xml in a different LVM which is also taken cared into a backup session, so that one can reinject missing qubes with qubes-wyng-util meta-recover VM_NAMES.

@maramek I’m interested into those problems, if you could point to other issues/discussions.
As pointed before, those are the filesystem areas that changes between dom0 reboots:

So I guess we are talking about incoherent states that could affect the system, kept under:

  • /etc/libvirt/libxl
    
  • /etc/lvm
    
  • /etc/xdg/adjtime
    
  • /home
    
  • /root
    
  • /var
    

It’s about information what LVM volumes are in the system. If you restore all volumes together with dom0 (and remove extra ones) - basically restore whole disk state to some point in the past - then it should be fine (at least in the current qubes version). But if you restore just dom0, then dom0 may get confused about what VM volumes are there, what backup revisions they have etc. This especially applies to VMs created after the snapshot was taken - they would have LVM present, but no qubes.xml entry (just one example of possible issues).

qubes-wyng-util meta-prep takes such qubes.xml backup in a distinct LVM prior of merging, and is present when rebooting in dom0 merged state. User could reinject missing qubes with qubes-wyng-util meta-recover VM_NAMES and could easily be integrated automatically from LVMs being present but unmatched from qubes being present on reboot. Even giving a choice to the user to wipe those unlinked LVMs or restore qubes settings with qubes-wyng-util meta-recover.


But to answer original OP @redmind question (let’s forget about wyng actual state and integration of dom0 under Qubes 4.1, shall we…)

I still believe it is completely sane to manually create a snapshot before doing something under dom0 (creating new VMs being out of question, as explained above)


Do action touching only dom0.

Reboot into that state. Voila.

What has worked for me is to snapshot ‘root’ similar to @Insurgo suggestion, but also to keep an update of /boot inside the root volume using rsync -a --delete /boot /boot-bak. I do this because you can’t choose a prior version of Xen from the boot prompt, so a bad Xen upgrade could require restoring the previous version. Note: there are other changes a user might legitimately do to dom0 that could make the root snapshot valuable; I don’t think having multiple Xen versions would obviate the snapshots.

The VM metadata can be copied from the newer root to the target root during the recovery process.

FWIW, Ubuntu now comprehensively manages snapshots of the entire system in the update & boot processes, but it also allows user to revert /home as a separate choice.

1 Like