Pool level deduplication?

Insurgo · July 22, 2022, 1:49am

From own quoted source:

WHAT IS THE DEDUP TABLE (DDT)?
AND… IS IT STORED IN RAM, OR ON DISK?

This is a common point of confusion.

When deduplication is used, the dedup table is part of the way that data is stored in the pool. ZFS uses a hashed list of blocks, to allow easy identification of duplicate blocks. In simple terms, to find an actual block of data on disk, ZFS uses the DDT as an extra step.

The DDT is a fundamental pool structure used by ZFS to track what blocks make up what files, when dedup is used. It’s as much a part of the pool as the dataset layout, the snapshot info, pointers to files, or the file date/time metadata. If you lose it, your pool is dead. If ZFS needs it, it reads it from the pool on demand, and uses the data contained to identify not just duplicate blocks, but also to find data on disk. So the dedup table is not a cache or an extra (like ZIL or L2ARC), that gets stored in RAM and if we lose it. too bad. If DDT data is in RAM or L2ARC, it’s only there temporarily, like any other in-use pool data.

In other words, for all practical purposes you can think about ZFS handling dedup metadata identically to any of that sort of stuff, if that helps. It’s integral to the pool. And the pool won’t work well if it can’t access the DDT fast, when needed.

Modifying OP accordingly and posts sayimg 1tb of atorage deduped data needs 1gb ram. Basically as of now, @Rudd-O knows best being a user of ZFS.

Mind to share your experience and use cases? How does it behave on Qubes intended use case? (Deduped templates, or, specialized clones?)

Also, @Rudd-O, any comment on performance hug for combined read and writes from referred article, writer saying that special SSD are required (not to go for Samsung EVO pro devices)?

Author recommends optane SSDs:

Optane and pure battery backed RAM cards only .
I should clarify: That’s nothing to do with SSDs having too-small DRAM or SLC cache. It’s inherent in the SSD NVRAM chips themselves. Because it’s nothing to do with the device cache type or size, a “better” SSD or one with “better” or no cache, won’t help much.