Is it possible to have a root template from which other templates branch out and specialize?

Recently I have switched my sys-net, sys-firewall and sys-usb qubes from debian-11 templates to debian-11-minimal templates. And I plan to further make use of minimal templates for different use-cases of my computer usage (email, instant chat, web browsing, journaling, etc.)

Now, my sys-net, sys-firewall, and sys-usb qubes all have their own, distinct, minimal templates, deb11min-net, deb11min-fw and deb11min-usb, respectively.

And I notice that updating these minimal templates is essentially downloading the updates to debian minimal template 3 times (once for each -net, -fw and -usb templates).

This is not smart, and I would like to alleviate this effort.

So, the question I am wondering is this: Can I have a root minimal template, from which the -net, -fw and -usb minimal templates branch out and specialize? Specialization in this context means installing specific packages to the branch-templates.

This way the debian-11-minimal apt updates would be downloaded once and would be effective at the same time in deb11min-net, deb11min-fw and deb11min-usb templates, at once. And updating the -net, -fw, and -usb templates would include only downloading and updating their specialization packages.

                             ,---> deb11min-net (Template) ---> deb11min-net-dvm (Disposable Template) ---> sys-net (Dispoasble AppVM) 
debian-11-minimal (Template)  ---> deb11min-fw (Template)  ---> deb11min-fw-dvm (Disposable Template)  ---> sys-firewall (Disposable AppVM)
                             \---> deb11min-usb (Template) ---> deb11min-usb-dvm (Disposable Template) ---> sys-usb (Disposable AppVM)

In this scheme, updateing debian-11-minimal template would at the same time find updating affect on the deb11min-net, deb11min-fw, deb11min-usb templates, not necessitating downloading the same update packages 3x times over.

Am I making sense? Or do I have a flaw in my understanding of templates?

Sounds like a great idea, especially if the inherited templates use shallow filesystem allocation to only store changes (perhaps an overlay fs or just COW). I don’t believe this is directly possible today, even though you could manually achieve something like this with enough ingenuity. Perhaps a good feature request for 5.0 :slight_smile:

1 Like

When you update the software you need to merge all changes from the lesser template into the new files from the master template, you can’t just overwrite them with the change from the lesser template it could break the system if the versions are out of sync.

If I am understanding you right, then, of course, there would be an order to which templates be updated first and which the last.
The root template(s) be updated first, and that would, in effect, update the branch templates, too (without downloading the packages from apt sources).
After the root templates are updated, the branch templates get their turn in getting updates, and so on.

Perhaps you could use this instead: Updates cache options?

1 Like

Agreed. The most practical solution is a caching proxy for updates as @fsflover indicated above.

B

If i understand the underlying needs implied here, the question is if there is a way to deduplicate snapshots at a block level? Clones are actually snapshots of a parent volume(no cost at moment 0) until there are copy on writes (CoW), differences compared to that origin.

On default installation, we are talking about thin provisionned LVMs at the storage level. In the actual case, the clones will diverge from the origin, the origin will change as well from that poiht in time and those changes are not considered after cloning. Each of those volumes start diverging from origin and consumes space. The result right now is that those volumes will grow, separately, forever. LVMs, as I understand now, are not fit for that use case. The origin should never change otherwise space consumption goes exponentially. This is what we all do using Qubes, by the way. I’m realizing that Thin provisioning without deduplication is not fit for Qubes use case. But it seems that in-between volumes dedup doesn’t exist.

An additional, and unfortunate effect of this, is that if one person backups, deletes and restores those templates, they will consume 3 times the space, which is more then what is occupied from the origin templates of the backup. Also, backuping those 3 templates will consume 3 times the space in the backup itself from Qubes backup tool, since as opposed to wymg-backup tool, there is no dedup between volumes when creating the backups.

One can dodge bandwidth consumption from using a cache proxy, which will download package once, but will still consume 3 times the space when installed.

It would be awesome to have dedup between thin provisionned LVMs. Searched around and didn’t find anything existing.

@demi asked the question there without positive answer: https://github.com/jthornber/thin-provisioning-tools/issues/211

My understanding is that thin provisionned LVMs would need something that does some kind of cleansweeping by keeping volume maps, and point identical blocks to existing blocks instead of duplicating them. I also understand that that mapping would need to be kept in memory to efficiently one write once what is already present on another volume and/or be ran to mark the blocks as unused and point to other LVMs blocks on a schedule. I understand that those woukd he extremely memory hungry as well to be done on each write, so only cleansweeping at intervals could be done. And most probably only on offline volumes as well.

@brendanhoar @tasket @Demi : are there other filesystems and or filsystem managers better fit for that deduplication between volumes in a pool? Does this kind of deduplication between volumes of a pool even a thing? Something exists?

I think we now have two questions:

  1. Q: Can updates to a parent template automatically apply to child templates that might also have other software installed and taking updates that the parent doesn’t have. A: Not feasible as a supportable feature with currently available tech. Best simulation is a general purpose template with appvms using snap or similar for “local” software installs.

  2. Q: Updating multiple similar templates with the same exact updates seems wasteful of storage space. Already discussed is reducing downloads using a caching proxy, but I see a question of “Can we merge the storage used by the updates after the fact using dedupe tech?” A1: somewhat feasible and I think @Demi has been talking to the thin-tools people about multiple issues including this area, but remember it may not give the return on investment one may assume, based on several variables (allocation layout etc). A2: my solution is to semi-manually rebuild a branch of forked templates from time to time, discarding the current more bloated set, using notes and/or bash scripts ; others ( e.g. @unman ) are more disciplined and use salt, etc.

B

1 Like