How to compare dom0 snapshots, to find out possible malware / compromise?

newbie · November 12, 2022, 2:40am

hi @Insurgo , this thread is to continue snapshot discussion,
taken from thread firmware backdoor

i tried /usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap
and got warning “sum of all thin volume sizes exceeds the size of thin pools & the size of whole volume group”.

i don’t understand, what is thin volume sizes, thin pools, & volume group ?

i thought snapshot is similar to report about dom0, but the warning make me assume that, snapshot is kind of dom0 backup, so which one is correct ?

what’s the difference between snapshot & backup ? or they are the same ?

after creating snapshot, would it consume many storage space in dom0 ?

i tried to create root-autosnap twice, but the 2nd time said that,
it has been created in volume group “qubes_dom0”.

but when tried cat /lib/systemd/system-shutdown/root-autosnap
couldn’t find root-autosnap under those directory

how to compare snapshot / volume snapshot of any VM ?

what is lvm ? i couldn’t find the definition anywhere.
how to find these 2 states Qubes created by default ?

about disk forensic, can we consider comparing snapshot, prior/after compromised,
as disk forensic ? or there is other way

thanks and regards,

Insurgo · November 12, 2022, 3:08pm

Houla.

The warning you get is a friendly reminder that the sum of all Thin LVMs (LVMs are "logical volumes not fully provisioned (difference between Thin LVMs and plain LVMs), where Thin means only consumed space is taken from the Volume Group).

Basically, this is telling you that if you consume all assigned space in Logical Volumes, the Volume Group would be overfilled. This is where Qubes will give you warnings as you go, and where having dom0 in a seperated Volume Group will permit you to still boot into Qubes (dom0) to fix errors of qubes taking too much space in case they filled your pool (vm-pool is seperared from dom0 under Q4.1).

A snapshot is a zero cost LVM clone in a single point in time that can be passed as a device to a qube. Qubes uses this extensively to create -*back volumes (sudo lvs | grep back / ls /dev/qubes_dom0/*back) that can be reverted through qvm-volume revert through two snapshots being kept for everything else but dom0 per default (default here is Thin LVM provisioning on top of LUKS encrypted container. That is, Qubes creates 2 partitions on top of a disk, where first partition is unencrypted boot, then a second partition which is LUKS2 encrypted container, in which swap, two volume groups (dom0 and vm-pool pools) are created).

That is a reference to a script that should contain those commands (a script is a text file that is not executable unless chmod +x is called upon it, to make it executable.)

So as if now, you are calling parts of the scripts individually, where the script made executable under systemd directory would rotate the snapshots upon shutdown.

To mount those snapshots, you would have to call qvm-block scripts to pass those volumes as drives to other qubes. It is not adviseable to do more stuff under dom0 then this, which doesn’t interact with content of block devices as per official instructions : How to mount LVM images | Qubes OS

Note: qvm-block attach --help is your friend here, which will list --ro as an option here: this is, attaching a block device in Read Only, so that forensic can be done on those block devices in a disposable qube where they will be passed to, preventing any modification on the content of that block device.

On the mechanics of LVM, I would suggest to read a bit deeper from RedHat or some other places. The main criticism that is found everywhere on LVM is that it adds a lot of complexity, and that results into knowing what a volume group is and what logical volumes are in LVM terms.

But the get home message here, under Qubes usage of LVMs, is that LVMs are Copy on Write (CoW) volumes. That is, creating a snapshot is no cost, but whatever changes that happens on top of a snapshot or over the original volume that was snapshotted will diverge later on. When changes starts between those volumes, this is where storage cost begins to show: the divergence between those volumes will grow as fast as the delta between those several snapshots at block level. So here: by using the script under systemd, you guarantee yourself that the sum of those block level changes won’t be too high; so much can happen between reboots. But if you keep a dom0 snapshot from soon after installation (let’s call it dom0-clean) then the divergence from that state and current dom0 state will grow big. Not so much a problem, because in the script above, the snapshot is not kept under dom0 pool but under vm-pool.

So you need to keep an eye on vm-pool there. One day or another, you might need to remove those snapshots. This is also why I vouch for wyng-backups there. Since wyng cares on delta between LVM blocks, it is pretty clean to keep those two dom0-autosnap and dom0-autosnap-back, and have wyng-backup restore the LVM volume in a certain state when needed, instead of having those states not cleaned up automatically outside of systemd, where those snapshots are rotated to make sure that LVM sizes are not exploding because left alone.

Last note here: you can remove those snapshots at any point in time through lvremove if needed. Afterall, this is exactly what Qubes is doing when rotating LVMs for its qubes (-snap, -*back and original volume name when you check what is happening through sudo lvs and the device mapper files through sudo ls -al /dev/qubes_dom0/ before and after starting/shutdown of qubes on their disk level).

Insurgo · November 12, 2022, 3:22pm

This thread is actually a duplicate of Dom0 backup/snapshot?

newbie · November 23, 2022, 5:25pm

hi @Insurgo ,

thanks a lot for your explanations,
now i understand those terms better:

i think those explanations will be useful for other users too.
i see, you’ve made a lot of contributions to the forum.

anyway, i have tried to create dom0 snapshot, by using:
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap
and the creation was success, but i couldn’t find the snapshot inside qubes_dom0.
may tell me, how to find the created snapshot ?
so we can copy to other VM for forensic.

yes, after reading it, i figure out, the thread is also about snapshot creation.
also about using snapshot to revert dom0 state.

so, maybe i will change the title for this thread to:
comparing dom0 snapshots to find suspicious malware / compromise,
so to contribute different things.

i guess if we compare 2 dom0 snapshots,
for sure it will have different volume size,
but i think, it is not enough to indicate malware / compromise,

do you have any tips / idea ?
how to compare the snapshots to find out compromise,
maybe i need to focus on specific part in the snapshot ?

Insurgo · November 23, 2022, 7:22pm

sudo /usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

Creates a snapshot under root-pool, which might actually not be desirable if not-automatically maintained by systemd shutdown script:

[user@dom0 ~]$ sudo lvs | grep autosnap
  root-autosnap                                            qubes_dom0 Vri---tz-k  20.00g root-pool root

That snapshot is under qubes_dom0/root-autosnap. It is not visible, because not activated, and because it was created to not be visible nor ativated automatically :

[user@dom0 ~]$ ls /dev/qubes_dom0/ | grep autosnap
[user@dom0 ~]$ ls /dev/mapper/ | grep autosnap

Which gave no output since the volume is not activated (per -s (snapshot) option, which as man explains: sets the --setactivationskip when a thin lvm snapshot is created in the same pool. Basically, this snapshot lvm is “hidden” even if available, and showed only under sudo lvs output.)

You can activate it manually through the -K flag, passed to lvchange:
sudo lvchange -ay /dev/qubes_dom0/root-autosnap

As a result, it will be discovered from dom0 udev and also you will receive a notification from the device widget. This device can then be seen through qvm-block ls and can be passed to a dvm with --ro as through other instructions below (while snapshots are normally ro, and also enforced as such through -pr, but we like good habits).

Better to do those snapshots at shutdown.

But as said before, taking a snapshot of a live LVM is not really desired, since logs and other files opened won’t be closed correctly and some states might not be consistent (writes not being always atomic): this is why it is better to take a snapshot at shutdown, and why it is recommended to have such shutdown script under systemd shutdown script (and to make sure that that the pool doesn’t explode because those old snapshots are forgotten in place).

This will take a snapshot when all the files are closed correctly, so that comparison would be meaningful between two snapshots.

Let’s play with the concepts a bit. man lvcreate is of course a friend again.
sudo /usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

--noudevsync is preventing new device discovery and is counterproductive here
-s creates a snapshot
-pr creates this snapshot in read only mode
-An specifies that we do not want autoback. We do not want to have metadata backed up automatically.

Dom0 backup/snapshot?

#!/bin/sh

#This permits wyng-backup to backup root-autosnap and root-autosnap-back, taken at each system shutdowns like any other QubesOS LVMs.

#We delete the backup of last shutdown snapshot (last last shutdown)
/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap-back || true
#We take a snapshot of root-autosnap into root-autosnap-back
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root-autosnap -n root-autosnap-back
#We remove root-autosnap
/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
#We create root-autosnap from root
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

My current use case, on 4.1, is to have one single snapshot created at each shutdown.

I use wyng-backup to backup that volume incrementally, as any other lvm volume.
When I want to compare states I create another snapshot of root, in read/write mode, and I restore that wyng incremental backup under that snapshot.

My current systemd shutdown script looks like the following, since I have no use to keep two snapshots. The goal of that was to show to Qubes devs what changed under dom0 between two reboots:

[user@dom0 ~]$ cat /usr/lib/systemd/system-shutdown/root-autosnap.shutdown 
#!/usr/bin/sh
#This permits wyng-backups to backup root-autosnap, taken at each system shutdowns like any other QubesOS LVMs.
/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap || true
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

Where two shutdown snapshot was used before to expose and help visualize dom0 changes across reboots:

But in the goal of exploring even more live changes and detail them more, we will create snapshots on a live system.

So to test this and complete this example:
We will create two manual snapshots of a running dom0, back to back. One prior of starting a dispvm, and a second after having passed the first volume to that dispvm, hoping to see dom0 relative changes linked to qubes starting, logs created and so on relative to this small period of time under dom0’s disk.

1- Creating first snapshot of root into qubes_dom0/root-manual-1
(you can man lvcreate. -kn is added, while udev bypass is removed and the volume to be activated, while still in read only (-pr) since we want the volumes to be exposed to dom0 to be easily passed to dispvms (this is based on wyng-backup lvmcreate call:

github.com

tasket/wyng-backup/blob/f2c7548e5c5a4796c141c06a5771f095547bb03d/wyng#L1283


      
              if not exists(mapfile) or not lv_exists(vgname, snap1vol):
                  if not monitor_only:
                      print("  Pairing snapshot for", datavol)
                      complete_vols.add(datavol)
                  else:
                      print("  Skipping %s; No paired snapshot." % datavol)    ; continue
          
              # Make fresh snap2vol
              lv_remove(vgname, snap2vol)
              tags =["--addtag=delta"] if monitor_only else []
              do_exec([[CP.lvm, "lvcreate", "-pr", "-kn", "-ay", "--addtag=wyng"] + tags
                              + ["--addtag=arch-"+aset.uuid, "-s", vgname+"/"+datavol, "-n",snap2vol]])
          
              # Volume is OK, add to list of vols.
              if datavol not in complete_vols and vol.sessions:
                  incr_vols.add(datavol)
              else:
                  complete_vols.add(datavol)
          
          return incr_vols, complete_vols

[user@dom0 ~]$ sudo /usr/sbin/lvcreate --ignoremonitoring -pr -kn -s -ay qubes_dom0/root -n root-manual-1
  WARNING: Sum of all thin volume sizes (3.01 TiB) exceeds the size of thin pools and the size of whole volume group (464.74 GiB).
  WARNING: This metadata update is NOT backed up.
  Logical volume "root-manual-1" created.

2- Start fedora-36-dvm template based dispvm (in my case its name is disp3274)

3- Passing root-manual-1 to disp3274 (based on How to mount LVM images | Qubes OS)


[user@dom0 ~]$ readlink /dev/qubes_dom0/root-manual-1
../dm-79
[user@dom0 ~]$ qvm-block ls
BACKEND:DEVID  DESCRIPTION                 USED BY      
dom0:dm-79     qubes_dom0-root--manual--1

So in this case, we want to pass dom0:dm-79 to our dispvm. We can abstract that normally dispvm has /dev/xvda-/dev/xvdd passed to them. We can ask that dispvm to report so before and after having passed a block device from dom0 if we want to make sure.

Passing first volume to dispvm:
[user@dom0 ~]$ qvm-block attach --ro disp3274 dom0:dm-79

4- Creating second snapshot

[user@dom0 ~]$ sudo /usr/sbin/lvcreate --ignoremonitoring -pr -kn -s -ay qubes_dom0/root -n root-manual-2
  WARNING: Sum of all thin volume sizes (3.05 TiB) exceeds the size of thin pools and the size of whole volume group (464.74 GiB).
  Logical volume "root-manual-2" created.

5- getting its internal name from referred upstream documentation

[user@dom0 ~]$ readlink /dev/qubes_dom0/root-manual-2
../dm-130

(note that a ls -al /dev/qubes_dom0/root-manual-2 tells us the same information, which is the lik to the dm entree that corresponds to our friendly chosen name)

Passing the second snapshot to dispvm in read only as well
[user@dom0 ~]$ qvm-block attach --ro disp3274 dom0:dm-130

6- Mounting those volumes under dispvm

xvdi is first snapshot
xvdj is second

mkdir -p /tmp/first /tmp/second
sudo mount /dev/xvdi /tmp/first
sudo mount /dev/xvdj /tmp/second

I have meld installed. if you don’t sudo dnf install meld

7- Comparing snapshots
sudo meld /tmp/first /tmp/second and you can click “File filters” and disable “Same” file status, since we want to see what files were modified and new in this small scale comparison, outside of two clean reboots, and outside of dom0 upgrades which would otherwise be related to the thread https://forum.qubes-os.org/t/verifying-installation

That will take a while.

We are comparing the content of all the files between those two states afterall, without knowing specifically what to discard, and if we discarded some paths, we would not know what had changed in those paths.

This is the dilemna of any host based intrustion detection system (HIDS), which is also subject to a lot of other threads and won’t be covered here. File system differences is one of their weapon for intrusion detection, but requires to be instructed on what at least is expected to change and discard those changes. Here we use meld which is just a file/directory comparator.

This exercise is interesting to show why we need dom0 to externalize its states.

Results:

Lots of “dangling” symlinks causing “orange” indicators. We dismiss them and focus on green (new) and blue (modified) meld’s indicators.
/etc/libvirt/libxl/disp3274.xml
/etc/lvm/archive/qubes_dom0*.vg states
/etc/lvm/backup/qubes_dom0 file
/home/user/.config/pulse changes
/home/user/.local/share/qubes-appmenus/disp3274
/home/user/.xsession-errors
/var/lib/qubes/appvms/disp3274
/var/lib/xen/userdata-*.libvirt-xml
/var/log/journal/*/system.journal
/var/log/libvirt/libxl/disp3274.log
/var/log/lightdm/x-0.log
/var/log/qubes
/var/log/qubes
/var/log/Xorg.0.log

That’s about it for two dom0 warm snapshots taken minutes apart with a little action in the middle.

8 - Cleaning up
Once we close our dispvm which accessed the lvm’s dm directly, we can cleanly remove those created snapshots which should not be used anymore:

[user@dom0 ~]$ qvm-block ls
BACKEND:DEVID  DESCRIPTION                 USED BY
dom0:dm-130    qubes_dom0-root--manual--2  
dom0:dm-79     qubes_dom0-root--manual--1

Doing:

[user@dom0 ~]$ sudo lvremove /dev/qubes_dom0/root-manual-*
Do you really want to remove active logical volume qubes_dom0/root-manual-1? [y/n]: y
  Logical volume "root-manual-1" successfully removed
Do you really want to remove active logical volume qubes_dom0/root-manual-2? [y/n]: y
  Logical volume "root-manual-2" successfully removed

That’s it for the example.

You should by now have understood by now that searching for a compromise is difficult without knowing what is a clean “baseline” to compare to. What would that baseline look like? It would be an exclusion of all the files known to be clean, as reported by some kind of authority.

Under dom0, the only tool we have that can provide us some output about that is the rpm database under Fedora, and then inspect the files that are not reported by rpm as being managed and/or files that were modified as compared to the signed (authenticity+integrity contract) references to be compared externally.

I understand that this answer might not be the one you expected. Snapshots only permits exposure of differences that can then only be inspected and investigated against something else. This is why until we have a read only dom0, where externalized states such as the ones and all the others reported above can be outside of that dom0 can be validated easily (if dom0 is read-only (outside of system upgrades), then between those upgrades, the dom0 filesystem could be checked externally, fast and easily for integrity (dm-verity being the best and well known option). As you may understand this is not yet existing, so the option here is to keep dom0 as clean as possible, limit oneself from installing suff inside of dom0 and verifying integrity contracts/inspecting files that are not part of a standard installation and common places for places where scripts would be started automatically after a dom0 is compromised.

You might want to continue this discussion under

newbie · November 25, 2022, 9:11am

hi @Insurgo

it’s okay, thanks a lot for all information.

i will keep dom0 as clean as possible, so that find little difference only between snapshots. so if i can ensure Heads & Qubes are clean, but if Laptop is still compromised, then I can narrow down & conclude that the possible backdoor is inside the firmware in any chipset.

actually, my current dom0, in my opinion, is clean enough, because i am sure, so far i have done 2 things only, related to dom0, which are creating sys-usb & running Qubes update via update launcher.

how’s your opinion ? if we are sure that Heads & Qubes are clean,
then can i conclude that the possible backdoor is inside the firmware ?
which temporary at the moment, we can do nothing about it.
or maybe you have other attack vector possibility ?

if not mistaken, previously you explained, that Qubes creates 2 partitions:

1st partition is an un-encrypted /boot (guess heads signs /boot)
2nd partition is an encrypted container, which consist of 2 pools: dom0 & VM-pool.

so now, after i “sudo lvs”, i see 3 LVs: root-pool, swap, & vm-pool.
i guess root-pool = dom0, part of 2nd partition encrypted container, is it correct ?
but what’s the purpose of LV “swap” ?

also, if /dev/qubes_dom0 is being used to store snapshot,
then what’s the purpose of /dev/mapper ?

why it has to be 2 types of snapshot ? activated visible & deactivated not visible.
what’s the difference between both ?
i guess it is not related to whether the VM is started or not ?

i activated the snapshot: sudo lvchange -ay -K /dev/qubes_dom0/root-autosnap
then all these readlink, qvm-block ls, and ls -al /dev/qubes_dom0/root-autosnap,
can display the device successfully.

i guess, dom0 udev refers to these dm-*, is it correct ?
but what does dom0 udev represent ?
also which one is device widget ?

i couldn’t find wyng-backup in dom0, or maybe i need to install first ?
is it secure to install wyng-backup in dom0 ?

can we store snapshot, from root-pool or vm-pool,
into vault vm, or into USB thumb drive ?

until this step, how can we sure that xvdi & xvdj are 1st & 2nd snapshot ?

thanks and regards,

Insurgo · November 25, 2022, 2:14pm

My mistake trying to oversimplify.
dom0’s LVM pool (LV) is root-pool. Swap LV is dom0’s swap, that “disk” memory backup plan if you use too much memory under dom0, so that the system goes slower instead of killing programs following obscure rules (swapping ram to swap to continue intense ram operations, lighting up that disk LED while doing so, swapping in and out of ram what is not currently needed/currently needed to do tasks that do not fit in memory for swappable ram regions (virtual memory/of applications, also an oversimplification).

root-pool, vm-pool and swap are under 2nd partition, the LUKS encrypted container.

/dev/qubes/qubes_dom0/ is where LVM does its abstraction for pool related volumes, on top of dm which is the mapped LVMs listed under /dev/mapper. Latest is where dmsetup stores its volumes (dm) that are abstracted from LVM, which deals with the complexity of dmsetup.

As I tried to simplify from the docs (man is your friend) snapshots are normally hidden when created, unless they are intentionally activated. It is not related to when a VM is started or not, more to if the LVM is activated and monitored in the pool. As referred in the previous Fedora linked article and more generally under Qubes, this complexity is abstracted by tools (dmsetup is abstracted by lvmtools) which when digged up gets complex as this as you can see. One can dig up and understand the complexity as they go, but this is the reason people are vouching for other pools and files systems. BRTFS is a current and supported partitioning scheme alternative Qubes OS makes possible at installation, but then it becomes a complete different subject which i’m not currently interested in since not the default installation method, nor permitting wyng-backups to operate to create incremental backups/restore solution, which is a separate project for the time being, and which I rely on to keep states, in incremental backups, including dom0 as a root-autosnap snapshot, maintained by systemd script at shutdown as previously discussed. If a LVM volume exists, wyng-backup can be instructed to backup only the blocks that changed since last incremental backup. And even more powerful, only save blocks different from all other LVMs (deduplication is sooooooo magical for backups, even more under Qubes OS which suggests to clone and specialize Templates for compartmentalization purposes. This means that Templates sharing the same blocks will have those blocks saved only once). And this magic, deduplication in incremental backups within and across all backups, is really precious to me.

When Qubes OS creates its -back snapshot volumes, it makes them visible under /dev/qubes_dom0 because then are not deactivated when created. But those are not made visible through udev up to the Qubes device widget, nor exposed through qvm-block to be passed to another qube out of the box, where referred article from Qubes doc explains how to expose those LVMs to qvm-block so that they can be passed to qubes for comparison if needed. LVMs are logical volumes, which abstracts block devices which can all be used the same way. Remember, a snapshot is a complete and valid block device abstraction, just like a partition. Logical here means that it is abstracted. The meaning in all of this is that that LVM can be used as what it represented when created. A snapshot creates a static version of its origin in a point in time, and that abstraction permits you to deal with that copy in time, even if it’s origin changed since then. In our use case here, permitting to compare its content with another snapshot, or its origin to inspect what changed, either on the block level, or in its content.

udev is a device mapper upon detection. Udev bypass from lvm is bypassing “new block device being discovered” magic, so that LVM created are not triggering what would happen when a new device is hotplugged in the system. That udev bypass is simply a way to say: don’t mind about that new “block device” that was inserted in the system. Under sys-usb, udev is the first step (after kernel drivers being present and loaded) responsible to pop a new usb drive under you favorite file manager. It pops because there is no udev rules defined saying to not deal with the device. udev is another subject I will not dig down here in details but one can dig into if he wants.

The Qubes device widget is what permits you to pass udev discovered devices, as an example, from sys-usb to other qubes, just beside the clock’s dom0 widget. This widget is basically a qvm-device/qubes-block dom0 command line abstraction.

They appear as xvdX in the order they are passed to the qube. Many ways to list drives/partitions prior of passing them to a qube. fdisk -l, gnome-disks, lsblk… Again trying to simplify things, not going in all directions.

I dismissed, intentionally, other questions that were irrelevant to this thread.