First off, does this matter to you? No. No it doesn’t. Unless you’re backing up gigabytes of completely attacker controlled data, to an attacker controlled service, and need to ensure they don’t know you’re backing up said data, it Doesn’t Fucking Matter.
With that said, it’s a somewhat neat attack!
A watermarking attack is when an attacker who can get you to store an attacker-controlled piece of data can then detect the presence of that attacker controlled data. It’s not a huge deal, but is a concern if someone is able to inject a watermark into, say, copywritten or leaked content, and then automatically terminate the cloud storage/backup accounts of users that can be shown to have that data on their drive.
Restic uses Content-Defined-Chunking with Rabin Fingerprints with a secret, random polynomial in order to make watermarking attacks harder. (And, of course, the repo is encrypted, otherwise this attack doesn’t make sense to try to do, just read off the data directly).
This attack does not go after that. Instead, I simply repeat a given chunk of data many times in a row. This means that Restic will tend to create chunks that will have that size, or a multiple of it. Which means the attacker can use the size of the chunks they’re forcing the user to create to encode data.
Anyways, this does work… just not very well :P
A transcript of running it is below. The
check.rs tool only accesses what
metadata would be visible to an attacker without the password to an encrypted
repository, that being object sizes.
running attacker file with tags 1000 1000 2000 2000 -rw-r--r-- 1 jess jess 801M May 19 23:10 attacker.bin open repository repository e794b424 opened (version 2, compression level off) created new cache in /home/jess/.cache/restic lock repository no parent snapshot found, will read all files load index files start scan on [attacker.bin] start backup on [attacker.bin] scan finished in 0.224s: 1 files, 800.732 MiB Files: 1 new, 0 changed, 0 unmodified Dirs: 0 new, 0 changed, 0 unmodified Data Blobs: 528 new Tree Blobs: 1 new Added to the repository: 795.915 MiB (795.935 MiB stored) processed 1 files, 800.732 MiB in 0:04 snapshot b3f64a9e saved [src/bin/check.rs:22] &sz = [ ( 16777390, 8, ), ( 18045195, 7, ), ( 18034195, 6, ), ] ./check.sh 15.01s user 3.68s system 137% cpu 13.543 total
And for reference, this was ran with
restic 0.15.2 compiled with go1.20.3 on linux/amd64
At the end, the first number is the file size, and the second number is how many chunks of that size appeared.
Here, the attacker generated 800MB of data, and we can see 3 interesting sizes.
The two bottom ones are the real tags, the top one is noise. The attacker
generated chunks with the tags
2000, which are bytes added to the
base, so we would expect to see a multiple of that many bytes difference in
size between them. And we do, they’re 18MB large, but the difference is
Compression doesn’t really affect the attack since the data being inserted is random data, so Restic won’t try and compress it.
Future research might include trying to automatically recover the fingerprint polynomial so a watermark attack can be pulled off much more directly. That would require interactivity, where you backup attacker controlled data repeatedly.
What if you have webserver logs that you’re backing up, is that a viable attack
to confirm that
iykpqm7jiradoeezzkhj7c4b33g4hbgfwelht2evxxeicbpjy44c7ead.onion is some
specific account on a online backup service? (Not that I keep logs anyways :P)
Mitigations might be to pad chunks as they’re created, though that comes with a size cost. See Optimally Hiding Object Sizes with Constrained Padding, perhaps. That wouldn’t be a hard solution to the problem, but would make this harder to pull off.