Callback driver and lvm - how to start?

brendanhoar · March 24, 2022, 1:02pm

This is likely answerable only by @tripleh or the dev team, as I see no mention of using the Qubes storage callback driver on web searches, other than in the GitHub repo.

I’m considering using the callback driver to support ephemerally or permanently encrypted AppVMs, when the template is in the primary pool and isn’t encrypted but the AppVM may have some snapshot components that aren’t in the same VG.

What’s the best way to start experimenting with the callback driver? Put a logging call in the cmd config parameter to see what calls are being made to see if it is feasible? E.g. I’m not sure if callback has visibility into volume paths or the ability to change the volume path used by the backend driver.

[Plan is to do dm-snapshot from the template. Unlike lvm-thin external-origin snapshots which must share a VG with the source, dm-snapshot origin can exist anywhere on the system.]

B

tripleh · March 24, 2022, 4:19pm

Hi Brendan,

the callback pool can be configured via /etc/qubes_callback.json (doesn’t exist by default).
There’s an example file at usr/share/doc/qubes/qubes_callback.json.example. The examples include some on encryption, however with the file pool. The callbacks lack newlines as json doesn’t allow newlines. Usually one would call some external script to keep it more readable.

Back then I did it with the file pool as my understanding of the lvm pool driver was and still is much worse than that of the file pool driver.

There’s also some command-line code at the top of the source file [1]. That should explain how the test examples can be instantiated.

With the file pool driver it’s mostly a matter of providing Qubes OS a path to a directory (dir_path) with which it can work as usual. The callbacks pre_setup & pre_sinit make sure it is there right in time, post_destroy removes it as needed. The others are just safeguards (qubesd crash on non-zero exit code).

The major drawback of using the callback pool driver is that it is specific to the underlying pool driver. Also, it is probably slightly slower than a native implementation. One the plus side, you can pretty much do whatever you want.

[1] qubes-core-admin/callback.py at master · QubesOS/qubes-core-admin · GitHub

Good luck,
David

brendanhoar · March 24, 2022, 5:19pm

Hi David!

Thank you for taking the time to reply.

Yes I’ve been looking at the sample json and the bdriver parameters (cross-checking with the file.py and lvm.py code as well as the lvm callback unit tests).

I just wanted to verify that a first step of “create a new lvm entry that points to the main lvm pool and put a logger in the cmd property but none of the ‘on’ properties” is the right way to go to understand if the passed data can be used for what I want to do with lvm.

Anyway…

I was hoping for a way to intervene right before the lvm devices were handed to Xen (libvirt?) and to swap out the device paths to non-lvm snapshot devices and then similarly right when Xen/libvirt relinquished them to do some fixup and potentially migrate data back to lvm thin pool devices before lvm.py/storage.py performs its own post shutdown volume cleanup.

The bdriver available parameters for lvm makes me think I can’t do that using the callback driver as the devices still need to be in an lvm pool and if I want snapshots the same VG.

It’s possible I could use the file driver and then maybe derive the paths to the lvm thin volumes for the VM though to get the device manager snapshots? But then I’d have to replicate a lot of tasks that lvm.py does…I think?

B

PS - the crux of the issue with per-VM encrypting of thin LVM LVs is that while LVM thin snapshots can be given a read only external origin outside of the pool, that origin must exist as a standard LV or a thin LV stored somewhere within the same VG, which prevents inserting a device manager crypt layer. E.g. I want the template root volume unencrypted but the changes written to the root snapshot for the appVM written as ciphertext (and then discarded after shutdown). This would be useful for anti-forensics for disposable VMs.

brendanhoar · March 24, 2022, 5:56pm

Marek is working on an alternate approach where the root volumes are always read-only and the (Linux anyway) VMs know to create an overlay that writes to a partition on the volatile volume which will support ephemeral encryption.

This is great for appvms based on qubes maintained templates but won’t work for the general case. I’m trying to find a general solution (for disposable/ephemeral VMs whether they are based on a qubes template or not).

B

tripleh · March 24, 2022, 6:03pm

The bdriver parameters are just the ones from qvm-pool drivers (bdriver = backend driver).

And yes, you can use logger, but it won’t help too much for understanding as stuff happens quite quickly…

But yes, various callbacks work before Xen takes over.

I’d recommend to start with how you’d do it manually without a callback pool, then apply the logger callbacks and see what happens when, and then attempt to transfer your manual code to callbacks.

So for example for encryption with lvm, you might want to create an encrypted loop file for a pool and assign that loop file a volume group and a thin pool, then tell Qubes to use that as pool parameters for volume_group & thin_pool with the lvm driver. This would have to go to the pre_setup callback.

Assuming the pool already exists, you’d have to open the encrypted loop file and prepare it on reboot. That would have to go to pre_sinit (sinit = storage init) callback.

And assuming you’d like to remove the loop file on pool destruction with qvm-pool remove, you’d have to do that on post_destroy.

brendanhoar · March 24, 2022, 7:18pm

Sounds like a good experiment, I’ll try it out this weekend.

That’s the kind of feedback I was looking for.

Thanks again.

[As an aside, I don’t think it will solve the core issue of lvm forbidding snapshots from working across two different VGs though, so I don’t think it will support my long-term use case of preventing information leakage to the primary pool when the Template is located in the primary pool but the AppVM private, volatile and root-snap (derived from the Template’s root) are held on separately encrypted/ephemeral storage (lvm pool or other). LVM could have allowed this (as device mapper snapshot does), at least for non-persisted volumes, but they don’t, probably driven by reducing foot/gun issues.]

brendanhoar · March 24, 2022, 9:18pm

I wonder what happens if at pre_volume_start you: do housekeeping, rename the symbolic link to the root-thin snapshot of the template root volume in /dev/qubes_dom0, then substitute in a link to the dmsetup snapshot at the original path before vm startup … and then after vm shutdown at pre_volume_stop, do housekeeping, remove that link and un-rename the original root-snap back to that path.

That might do it, if it doesn’t cause havoc elsewhere.

tripleh · March 24, 2022, 10:21pm

All of my previous comments were for per pool encryption.

If you want per volume encryption, you need to look into the on_volume callbacks indeed. However that is much harder to test as it’s impossible to replicate manually and having multiple pools (in the worst case one per VM) doesn’t really hurt anyway.

Your particular use case (root snap in non-template pool) however violates the Qubes OS doc in general, cf. man qvm-create on the -P parameter: “All volumes besides snapshots volumes are imported in to the specified POOL.”
Why does one need that anyway? Changes to the root volume go to the volatile volume and that one belongs to the appVM pool? So root snapshots are just for template revisions anyway?
You should be able to have an unencrypted template in a different pool (cf. [1]).

“I wonder what happens if at pre_volume_start you: do housekeeping, rename the symbolic link to the root-thin snapshot of the template root volume in /dev/qubes_dom0, then substitute in a dmsetup snapshot at the original path before vm startup … and then after vm shutdown at pre_volume_stop, do housekeeping, remove that link and un-rename the original root-snap back to that path.”

I guess you meant post_volume_stop, but in theory such stunts are possible. In practice however you probably forget about some weird edge case and suddenly your backups (which would use volume_export without start) don’t contain the data you expect anymore etc. Also you become really dependent on whatever changes happen in the upstream lvm driver.

[1] volatile volume of DispVM may land in a different pool than its template · Issue #5933 · QubesOS/qubes-issues · GitHub

brendanhoar · March 24, 2022, 11:35pm

Sensitive data can end up in logs, and for standard AppVMs, those logs are written to the template root snapshot in the main pool, if the AppVM references that template, even if AppVM lives in a different pool.

Yes the snapshot is also removed after shutdown, but remnants of what was written to the snapshot may live in the main pool slack space instead of in the VMs pool (e.g. vm-specific encrypted pool) slack space.

This is mostly about data leakage.

In the future, IIRC, marek’s plan for the templates is to have all AppVM/dispvm root snapshot io and all dispVM personal volume io layered through a (larger) volatile device/overlay, with the appropriate source snapshots/LVs set to readonly in dom0, plus the option of an ephemeral key for the volatile LV on each vm startup.

This will be great…but it will only apply to qubes template-based VMs, and not things like windows templates.

B

tripleh · March 25, 2022, 3:47pm

Argh, looks like you’re right and currently volatile only contains the swap and the CoW for root volumes is indeed done via snapshots [1].
Apparently my memory wasn’t good enough for that one and/or I hadn’t read the doc carefully enough.

Anyway I still don’t understand why the straightforward solution of overwriting the root snapshot before its deletion was never considered?
This should be fairly easy to implement even upstream in the API as an additional property similar to ephemeral.
It should also work with callback pools in pre_volume_stop & pre_volume_remove. Hrmpf ok I just realized that I hadn’t implemented those two as I hadn’t considered them interesting… this could be remediated fairly easily though.
Fair enough, there’s also still a chance for decrypted leftovers on Qubes crashes with that approach… I guess that’s what checking Volume.is_outdated() on boot is for.

[1] qubes-core-admin/qubes-storage.rst at master · QubesOS/qubes-core-admin · GitHub

Demi · March 25, 2022, 5:46pm

The problem here isn’t actually lvm or even dm-thin. The problem is that the data will be written unencrypted to the pool, which isn’t what you want here.

One solution is to use a separate device-mapper thin pool that is created at system startup. Another is to use dm-snapshot the way the file pool does. I am not sure which is best.

It wasn’t considered because it doesn’t actually work on SSDs, and I suspect most users use SSDs.

tripleh · March 25, 2022, 8:20pm

Ah yes, so hardware-wise one could only use an ATA device.

Or hope for the best with blkdiscard -s / -z or whatever the hardware supports.

Or apparently stick with the file pool after all (maybe that’s why I had never noticed that issue on file pools before).

Admittedly not optimal for everyone.

Demi · May 24, 2022, 6:27pm

The file pool is deprecated and will eventually be going away.

brendanhoar · May 24, 2022, 6:48pm

Yeah. I’ve had some ideas of creating ephemeral per VM file pools and then using dm-xxx operations to put split-encryption snapshots (read clear remote origin, read encrypted changes, write encrypted changes) in that dir (which is a mount point for ephemeral dm-crypt), then clean up when done.

But since that pool type is going away in the next couple of releases, I’m not really motivated to get that to work.

So sadly, no forward progress.

B

Demi · May 24, 2022, 6:54pm

The reason the file pool is going away is because it does not handle persistent volumes well. This is a known weakness of dm-snapshot. The file pool handles ephemeral volumes fantastically, and a I at least would be fine with keeping a stripped-down version that only supported ephemeral volumes. There are other options as well, such as external-origin thin snapshots.