How to compare dom0 snapshots, to find out possible malware / compromise?

sudo /usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

Creates a snapshot under root-pool, which might actually not be desirable if not-automatically maintained by systemd shutdown script:

[user@dom0 ~]$ sudo lvs | grep autosnap
  root-autosnap                                            qubes_dom0 Vri---tz-k  20.00g root-pool root

That snapshot is under qubes_dom0/root-autosnap. It is not visible, because not activated, and because it was created to not be visible nor ativated automatically :

[user@dom0 ~]$ ls /dev/qubes_dom0/ | grep autosnap
[user@dom0 ~]$ ls /dev/mapper/ | grep autosnap

Which gave no output since the volume is not activated (per -s (snapshot) option, which as man explains: sets the --setactivationskip when a thin lvm snapshot is created in the same pool. Basically, this snapshot lvm is “hidden” even if available, and showed only under sudo lvs output.)

You can activate it manually through the -K flag, passed to lvchange:
sudo lvchange -ay /dev/qubes_dom0/root-autosnap

As a result, it will be discovered from dom0 udev and also you will receive a notification from the device widget. This device can then be seen through qvm-block ls and can be passed to a dvm with --ro as through other instructions below (while snapshots are normally ro, and also enforced as such through -pr, but we like good habits).

Better to do those snapshots at shutdown.

But as said before, taking a snapshot of a live LVM is not really desired, since logs and other files opened won’t be closed correctly and some states might not be consistent (writes not being always atomic): this is why it is better to take a snapshot at shutdown, and why it is recommended to have such shutdown script under systemd shutdown script (and to make sure that that the pool doesn’t explode because those old snapshots are forgotten in place).

This will take a snapshot when all the files are closed correctly, so that comparison would be meaningful between two snapshots.

Let’s play with the concepts a bit. man lvcreate is of course a friend again.
sudo /usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

  • --noudevsync is preventing new device discovery and is counterproductive here
  • -s creates a snapshot
  • -pr creates this snapshot in read only mode
  • -An specifies that we do not want autoback. We do not want to have metadata backed up automatically.

My current use case, on 4.1, is to have one single snapshot created at each shutdown.

I use wyng-backup to backup that volume incrementally, as any other lvm volume.
When I want to compare states I create another snapshot of root, in read/write mode, and I restore that wyng incremental backup under that snapshot.

My current systemd shutdown script looks like the following, since I have no use to keep two snapshots. The goal of that was to show to Qubes devs what changed under dom0 between two reboots:

[user@dom0 ~]$ cat /usr/lib/systemd/system-shutdown/root-autosnap.shutdown 
#!/usr/bin/sh
#This permits wyng-backups to backup root-autosnap, taken at each system shutdowns like any other QubesOS LVMs.
/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

Where two shutdown snapshot was used before to expose and help visualize dom0 changes across reboots:


But in the goal of exploring even more live changes and detail them more, we will create snapshots on a live system.

So to test this and complete this example:
We will create two manual snapshots of a running dom0, back to back. One prior of starting a dispvm, and a second after having passed the first volume to that dispvm, hoping to see dom0 relative changes linked to qubes starting, logs created and so on relative to this small period of time under dom0’s disk.

1- Creating first snapshot of root into qubes_dom0/root-manual-1
(you can man lvcreate. -kn is added, while udev bypass is removed and the volume to be activated, while still in read only (-pr) since we want the volumes to be exposed to dom0 to be easily passed to dispvms (this is based on wyng-backup lvmcreate call:

[user@dom0 ~]$ sudo /usr/sbin/lvcreate --ignoremonitoring -pr -kn -s -ay qubes_dom0/root -n root-manual-1
  WARNING: Sum of all thin volume sizes (3.01 TiB) exceeds the size of thin pools and the size of whole volume group (464.74 GiB).
  WARNING: This metadata update is NOT backed up.
  Logical volume "root-manual-1" created.

2- Start fedora-36-dvm template based dispvm (in my case its name is disp3274)

3- Passing root-manual-1 to disp3274 (based on How to mount LVM images | Qubes OS)


[user@dom0 ~]$ readlink /dev/qubes_dom0/root-manual-1
../dm-79
[user@dom0 ~]$ qvm-block ls
BACKEND:DEVID  DESCRIPTION                 USED BY      
dom0:dm-79     qubes_dom0-root--manual--1

So in this case, we want to pass dom0:dm-79 to our dispvm. We can abstract that normally dispvm has /dev/xvda-/dev/xvdd passed to them. We can ask that dispvm to report so before and after having passed a block device from dom0 if we want to make sure.

Passing first volume to dispvm:
[user@dom0 ~]$ qvm-block attach --ro disp3274 dom0:dm-79

4- Creating second snapshot

[user@dom0 ~]$ sudo /usr/sbin/lvcreate --ignoremonitoring -pr -kn -s -ay qubes_dom0/root -n root-manual-2
  WARNING: Sum of all thin volume sizes (3.05 TiB) exceeds the size of thin pools and the size of whole volume group (464.74 GiB).
  Logical volume "root-manual-2" created.

5- getting its internal name from referred upstream documentation

[user@dom0 ~]$ readlink /dev/qubes_dom0/root-manual-2
../dm-130

(note that a ls -al /dev/qubes_dom0/root-manual-2 tells us the same information, which is the lik to the dm entree that corresponds to our friendly chosen name)

Passing the second snapshot to dispvm in read only as well
[user@dom0 ~]$ qvm-block attach --ro disp3274 dom0:dm-130

6- Mounting those volumes under dispvm

  • xvdi is first snapshot
  • xvdj is second

mkdir -p /tmp/first /tmp/second
sudo mount /dev/xvdi /tmp/first
sudo mount /dev/xvdj /tmp/second

I have meld installed. if you don’t sudo dnf install meld

7- Comparing snapshots
sudo meld /tmp/first /tmp/second and you can click “File filters” and disable “Same” file status, since we want to see what files were modified and new in this small scale comparison, outside of two clean reboots, and outside of dom0 upgrades which would otherwise be related to the thread https://forum.qubes-os.org/t/verifying-installation

That will take a while.

We are comparing the content of all the files between those two states afterall, without knowing specifically what to discard, and if we discarded some paths, we would not know what had changed in those paths.

This is the dilemna of any host based intrustion detection system (HIDS), which is also subject to a lot of other threads and won’t be covered here. File system differences is one of their weapon for intrusion detection, but requires to be instructed on what at least is expected to change and discard those changes. Here we use meld which is just a file/directory comparator.

This exercise is interesting to show why we need dom0 to externalize its states.


Results:

  • Lots of “dangling” symlinks causing “orange” indicators. We dismiss them and focus on green (new) and blue (modified) meld’s indicators.
  • /etc/libvirt/libxl/disp3274.xml
  • /etc/lvm/archive/qubes_dom0*.vg states
  • /etc/lvm/backup/qubes_dom0 file
  • /home/user/.config/pulse changes
  • /home/user/.local/share/qubes-appmenus/disp3274
  • /home/user/.xsession-errors
  • /var/lib/qubes/appvms/disp3274
  • /var/lib/xen/userdata-*.libvirt-xml
  • /var/log/journal/*/system.journal
  • /var/log/libvirt/libxl/disp3274.log
  • /var/log/lightdm/x-0.log
  • /var/log/qubes
  • /var/log/qubes
  • /var/log/Xorg.0.log

That’s about it for two dom0 warm snapshots taken minutes apart with a little action in the middle.

8 - Cleaning up
Once we close our dispvm which accessed the lvm’s dm directly, we can cleanly remove those created snapshots which should not be used anymore:

[user@dom0 ~]$ qvm-block ls
BACKEND:DEVID  DESCRIPTION                 USED BY
dom0:dm-130    qubes_dom0-root--manual--2  
dom0:dm-79     qubes_dom0-root--manual--1

Doing:

[user@dom0 ~]$ sudo lvremove /dev/qubes_dom0/root-manual-*
Do you really want to remove active logical volume qubes_dom0/root-manual-1? [y/n]: y
  Logical volume "root-manual-1" successfully removed
Do you really want to remove active logical volume qubes_dom0/root-manual-2? [y/n]: y
  Logical volume "root-manual-2" successfully removed

That’s it for the example.


You should by now have understood by now that searching for a compromise is difficult without knowing what is a clean “baseline” to compare to. What would that baseline look like? It would be an exclusion of all the files known to be clean, as reported by some kind of authority.

Under dom0, the only tool we have that can provide us some output about that is the rpm database under Fedora, and then inspect the files that are not reported by rpm as being managed and/or files that were modified as compared to the signed (authenticity+integrity contract) references to be compared externally.

I understand that this answer might not be the one you expected. Snapshots only permits exposure of differences that can then only be inspected and investigated against something else. This is why until we have a read only dom0, where externalized states such as the ones and all the others reported above can be outside of that dom0 can be validated easily (if dom0 is read-only (outside of system upgrades), then between those upgrades, the dom0 filesystem could be checked externally, fast and easily for integrity (dm-verity being the best and well known option). As you may understand this is not yet existing, so the option here is to keep dom0 as clean as possible, limit oneself from installing suff inside of dom0 and verifying integrity contracts/inspecting files that are not part of a standard installation and common places for places where scripts would be started automatically after a dom0 is compromised.

You might want to continue this discussion under