Conversation
Ask fedi: replicating large file collections over slow links
Show content
I have a server "primary-na" with 50TB of arbitrary content in /srv, mostly in millions of small files, many of them identical hardlinks. I have 3 other servers across the world (copy-na, copy-eu, copy-ap) where I want to have the exact replica of primary-na's /srv. These replicas may be occasionally unavailable for hours on end, or they may be occasionally slow or under high load. The content on them may also occasionally bitrot and must be identified and healed.

I've researched this multiple times over the last few years, but I've still not found a solution that would beat "just run rsync over it when something changes on replica-na." It's simple and effective, but obviously super inefficient and IO-heavy on both ends.

Any suggestions on how you would do it?
12
4
5
re: Ask fedi: replicating large file collections over slow links
Show content
@monsieuricon In theory I manage my backups using BTRFS snapshot/send/recv. In practice I don't really manage them at all, so not sure how strong of a recommendation that is.
0
0
1
@asdil12 I'm aware of syncthing, but I'd need to see some evidence that it can scale up to 50TB and millions of files, properly recognize things like moves and hardlinks, etc. Unfortunately, I'm not in a position to easily experiment with it.
0
0
2
@amonakov Yes, I find it intriguing for some of its concepts, but it also has a major downside of needing extra 50+ TB for storing the repository. Also, it is really written to solve a different problem -- backing up data as opposed to replicating it to multiple nodes efficiently.
0
0
0
re: Ask fedi: replicating large file collections over slow links
Show content
@mss Correct, where it says "replica-na" it should say "primary-na".

The question of temporary files is actually an important consideration. The content of primary-na is distro data that is copied to the system via rsync with --delay-updates, so everything is written into ~tmp~ dirs and then moved in-place at the end of a successful run. Theoretically, this should be handled correctly by fs-based replication.
1
0
0