This project allows to run a deduplication proxy in front of an S3 compatible object store.
Heavily inspired by https://github.com/jortage/poolmgr and use https://github.com/gaul/s3proxy for the proxy part.
When a client send a file, it will be downloaded locally, hashed (sha512) and then stored in the object store only if not already present.
The client view/buckets are virtual and stored in Postgres.
You will need a Java JDK installed and sbt.
> sbt stage
...
> ls -l target/universal/stage
drwxr-xr-x 2 4096 Feb 25 18:37 bin
drwxr-xr-x 2 20480 Feb 25 18:37 lib
You first will need to create a config file, you can take inspiration from application.conf.
> ls -l
-rw-r--r-- 1 641 Feb 25 18:41 application.conf
drwxr-xr-x 2 4096 Feb 25 18:37 bin
drwxr-xr-x 2 20480 Feb 25 18:37 lib
> ./bin/s3-dedup-proxy -Dconfig.file=application.conf
Log backend use slf4j simple logger and can be configured using parameters such as -Dorg.slf4j.simpleLogger.log.timshel.s3dedupproxy=info
.
In addition to the object store proxy an http server is started (default port 23279
).
It allows to obtain the backing store resource url with an client resource, Ex:
> curl -v http://127.0.0.1:23279/proxy/tenant1/s3dedup/main/resources/application.conf
...
>
< HTTP/1.1 308 Permanent Redirect
...
<
* Connection #0 to host 127.0.0.1 left intact
http://127.0.0.1/mastodon/blobs/f/b83/fb83d051f2a1da53b43edee3986c21ca7c31f78bb956eb5b6bfad92ea741def9051d09e1d287aca74d22bd03db73fd49b350a4d5e1df8fe494dd01904996ec04
When a user delete a file only the database mapping is removed, the file stored in the object store is not removed even if it was the last reference.
Periodically a purge job is run to remove the dangling files (config proxy.purge
key).
It's possible to trigger the purge by and HTTP call too:
curl --request DELETE http://127.0.0.1:23279/api/purge
0 deleted
It deletes up to 1000 files (the maximum for a single call of RemoveObjects
).
The --build
ensure that the image is up to date (In case of Could not connect to 127.0.0.1:3306
error just retry, the DB was just not completely up).
> docker compose up --build S3DedupProxy
This will run three services:
S3DedupProxy
: the proxy itself bind 23278Postgres
: the database to store the file metadata (binded on 3306)S3Proxy
: used as local S3 store (binded on 8080)
You can then interact with the proxy and store using MinIO client.
{
"version": "10",
"aliases": {
"store": {
"url": "http://127.0.0.1:80",
"api": "S3v4",
"path": "auto"
},
"tenant1": {
"url": "http://127.0.0.1:23278",
"accessKey": "tenant1",
"secretKey": "ff54fae017ebe20356223ea016d1e2e867b7f73aaba775fe48ed65d1f1fecb87",
"api": "S3v4",
"path": "auto"
},
"tenant2": {
"url": "http://127.0.0.1:23278",
"accessKey": "tenant2",
"secretKey": "25ca6d3f04906bfc05a3f2c7ce6b5569ec841ee0ef241a886f34687bbafee7cc",
"api": "S3v4",
"path": "auto"
},
}
}
By default the test
user must write in the test
bucket, ex:
> mc ls store/mastodon/
> mc cp file1 tenant1/bucket1/
> mc ls -r store/mastodon/
[2025-02-13 22:53:04 CET] 151B STANDARD blobs/a/c52/ac52fd97d34cef83527d3b5022775db50b961127ab01b8f646b5040e6f42db02f7d1a6f46bdcd69527d0dbd7d7ee1d92c9681e60b604e2603986516b68541471
> mc cp file1 tenant2/bucket1/
> mc ls -r store/mastodon/
[2025-02-13 22:53:04 CET] 151B STANDARD blobs/a/c52/ac52fd97d34cef83527d3b5022775db50b961127ab01b8f646b5040e6f42db02f7d1a6f46bdcd69527d0dbd7d7ee1d92c9681e60b604e2603986516b68541471
> mc cp file2 tenant1/bucket2/
> mc ls -r store/mastodon/
[2025-02-13 22:55:20 CET] 34KiB STANDARD blobs/8/bb7/8bb7bc575a4a5c18fe537e913f9869bcc016925fdf7c6fbedd3602915cb8341bd609c059f0397f6a42c89bc17baa294f432c3d7983d524d84a8749fd40d1d917
[2025-02-13 22:53:04 CET] 151B STANDARD blobs/a/c52/ac52fd97d34cef83527d3b5022775db50b961127ab01b8f646b5040e6f42db02f7d1a6f46bdcd69527d0dbd7d7ee1d92c9681e60b604e2603986516b68541471
Have fun