Is it possible to have a root template from which other templates branch out and specialize?

Insurgo · July 16, 2022, 3:37am

If i understand the underlying needs implied here, the question is if there is a way to deduplicate snapshots at a block level? Clones are actually snapshots of a parent volume(no cost at moment 0) until there are copy on writes (CoW), differences compared to that origin.

On default installation, we are talking about thin provisionned LVMs at the storage level. In the actual case, the clones will diverge from the origin, the origin will change as well from that poiht in time and those changes are not considered after cloning. Each of those volumes start diverging from origin and consumes space. The result right now is that those volumes will grow, separately, forever. LVMs, as I understand now, are not fit for that use case. The origin should never change otherwise space consumption goes exponentially. This is what we all do using Qubes, by the way. I’m realizing that Thin provisioning without deduplication is not fit for Qubes use case. But it seems that in-between volumes dedup doesn’t exist.

An additional, and unfortunate effect of this, is that if one person backups, deletes and restores those templates, they will consume 3 times the space, which is more then what is occupied from the origin templates of the backup. Also, backuping those 3 templates will consume 3 times the space in the backup itself from Qubes backup tool, since as opposed to wymg-backup tool, there is no dedup between volumes when creating the backups.

One can dodge bandwidth consumption from using a cache proxy, which will download package once, but will still consume 3 times the space when installed.

It would be awesome to have dedup between thin provisionned LVMs. Searched around and didn’t find anything existing.

@demi asked the question there without positive answer: https://github.com/jthornber/thin-provisioning-tools/issues/211

My understanding is that thin provisionned LVMs would need something that does some kind of cleansweeping by keeping volume maps, and point identical blocks to existing blocks instead of duplicating them. I also understand that that mapping would need to be kept in memory to efficiently one write once what is already present on another volume and/or be ran to mark the blocks as unused and point to other LVMs blocks on a schedule. I understand that those woukd he extremely memory hungry as well to be done on each write, so only cleansweeping at intervals could be done. And most probably only on offline volumes as well.

@brendanhoar @tasket @Demi : are there other filesystems and or filsystem managers better fit for that deduplication between volumes in a pool? Does this kind of deduplication between volumes of a pool even a thing? Something exists?