encrypted garbage collection
Message-Id: https://www.5snb.club/w/encrypted-garbage-collection/
Linked-From: wiki.
server has a list of blobs, but cannot see the relations between them
how to manage?
server has a bloom filter for “active garbage collection” that can be enabled
client sends over a “start garbage collection”, gets back a random ID
client sends over updates to the bloom filter as needed, either incrementally or all at once (An update can only ever set bits, never unset them).
The bloom filter keeps track of what objects, by their hash, are live
the server doesn’t need to forbid new objects being uploaded at this time, but it does need to set it in the bloom filter.
once the client has walked every root and added every object to the server’s bloom filter, it then
runs “execute garbage collection id
”. The ID is to ensure that an in-progress GC started by
another client can’t conflict with this one. The ID is also sent on every update, to allow a
conflicted GC to halt.
Possibly also have a way to query for active GCs, but forcing a GC to run even if another client is GCing must not cause data loss.
Bloom filter parameters are set on GC start (or can be based off how many objects the server knows exists?) False positives are very much okay in this case, since they can just be cleaned up next GC. Maybe add a salt to the hash for every new bloom filter (the ID?), to ensure that different GCs have different false positives, so it tends towards cleaning every item up.