-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
two way sync of data using lsyncd #303
Comments
I've been pondering of doing that for a while and I think you need some kind of HA/Clustering to do what you describe here. Let me explain.
To achieve this, I've considering the following setup: b) server #2 has an lysyncd conf file with server #1 as the destination host. lsyncd only runs on -one- host at any given time and uses the local lsyncd conf file. You didn't say -how- upstream knew where to send the incoming file.. |
Am trying to achieve the same thing. Rsync creates a temporary file while transferring data in the target directory. That should be moved to another directory by using the temp_dir directive. Also with the update = true flag, rsync should ignore those files which have the same modification timestamp. I guess this should work but I'm yet to test comprehensively.
|
FWIW I wanted to setup two-way lsyndc for syncing files across my desktop/laptop while developing; after futzing with a few different options, I ended up writing a tool to support this specific use case: https://github.com/stephenh/mirror Apologies for dropping the link here in the lsyncd project, but I'd come across this bug report ~9 months ago, while researching various options before deciding to build mirror, so hopefully you don't mind the cross-link to my project for others that are trying to do the same thing. |
The use case your describing and I understand it, a slow machine to work with, a fast machine some where near by with a resonable fast, standing network connection, I'd personally in that case simply use X fordarding to execute the IDE on the desktop while forwading it's display to the laptop and do development on the laptop, while everything is computed on the desktop. "ssh -XC desktopmachine" and then start the IDE there with X-forwarding. Or if both are on your own (w)lan simply use NFSv3, to access the desktop server filesystem. BTW: Lsyncd handles initial sync just fine, in case you don't want to overwrite newer files just set update=true. There are people that are running Lsyncd as a two-way sync, other than setting update=true, you have to set an out of place temp_dir to hinder cycles. I just didn't ever really test two-way sync, nor have I fully though through all cases if it is race condition safe. Back the day in early design stages of Lsyncd I thought about writing a receiver-side as well with a custom transfer protocol, similar to what I get you are doing, but that seemed to much work for what I wanted to do. I wanted to use rsync as library and when I found I couldn't I simply decided to execute it the rsync binary, what is Lsyncd as it is today. A decission I didn't regret, since a lot of esotheric issues reported I could respond with "well thats rsync ask them". Just skimming through your code, are you single threaded or multi threaded? In the second case, in case of a new directory, it might be an issue if you first add the sync command and then add the new watch, you might miss something there. If its single threaded its not an issue. I remember debugging these cases having a lot debugging since the daemon always lags behind the file system, it can be a little tricky. I don't mind the link drop though. If you project keeps active for the upcoming time, it gladly put it in the docs as possible alternative with the pros&cons compared with Lsyncd, I was surprised when I first coded Lsyncd that there wasn't any open source thing avaialbe (and alive) after the years I've seen some alternatives come but sadly also go off again (last change 5 years ago etc.) |
And to conclude the issue (sorry for having let it open), yes what pravincar said, that works in most same cases. If the same file is written on both sides, practically in most cases the last write wins, but actually it can be a race condition which sides wins. In case both sides make a new directory with the same name and put both sides put different data in there before Lsyncd can react, the result is random. And a lot of these special, albeit rare cases have potential race conditions. Thats why it isn't supported nor mentioned in the docs. |
I tried that, and the lag, even when the machines are on the same LAN, is just noticeable enough to be annoying. (Several other people I work with tried the same thing at different times, e.g. surely X forwarding would work, and I agree in theory it should, but it's just not the same.)
NFS (nor any networked file system that I looked at, e.g. some of the fringe/research ones), supports inotify, which is the file system pushing events to the IDE that "files X/Y/Z changed". Without inotify, both Eclipse and IntelliJ have to fallback to polling to notice when a file changed, which is slow to pick up changes, or else the user has to explicit hit "refresh from file system", and then that is not quick either.
Ah okay, that is my mistake. I admit that when I played with lsyncd, it was also in combination with some rather elaborate setups that also used unison (there are a few blog posts that describe various approaches) that I think probably confused me vs. what raw lsyncd can do directly.
That is also interesting; somehow I completely missed this when first investigating lsyncd + other various options. ...I wonder if I read this issue before pravincar's comment in Dec 2015. I started mirror in March 2016, but I had been investigating alternatives off-and-on for awhile before that, so I'm not entirely sure when I decided I should try to build something myself. Granted, it doesn't matter now, I'm just curious why I didn't try the temp_dir/update=true setup with lsyncd first.
Yeah. Admittedly, once I started thinking about writing something, I'd already been looking for a good excuse to use the GRPC library. And when I saw it had two-way streaming built it, it seemed like a perfect fit, because it made the infrastructure I'd have had to build (client always calls server? E.g. client polls for server changes? Eh, sounds slow... Somehow roll my own server-push back to the client? Yuck...) basically become free. I do like mirror's approach in terms of not having to pay a connection/rsync-invocation cost each time a file changes, but admittedly I haven't actually timed how much of a difference there is.
Hehe.
Well, there are multiple threads, but I run them in an actor-like manner, so each thread can interact safely with it's own internal data structure, and then publishs events/messages to other threads via thread-safe queues. So, for watching file system events, there is a single thread for that, that pumps them off as soon as possible, and then queues them up for another thread to drop them into the UpdateTree data structure + execute the diff (which itself then puts the diff results onto a queue for another thread to put on the wire).
Yeah, agreed. My mental model with mirror is that, in theory, since the UpdateTree data structure is diff'd on it's own actor-ish thread, it should always, at least for it's own processing, have a basically-consistent view of the world. Granted, the file system can always subtly shift underneath it, but in theory those time slices will be small enough (the UpdateTree diff logic is very quick) that the user shouldn't notice any wonkiness (I have not noticed any so far (once the basic test cases/logic was working of course)).
Cool, thanks! Agreed I was similarly surprised there were not a handful of tools to pick from, vs. the "...basically just lsyncd and nothing else..." situation I found when researching things ~a year ago. Also understood you want to see if the mirror sticks around :-); it is currently core to my daily workflow, but that's driven by my work environment, and things can change. |
Sorry, you obviously know what inotify events are, I was just overly reciting my "mirror does ... by using ..." speech. :-) |
I added a link to your project both on the README and the Manual homepage. |
Thanks! |
Hi guys, I intend to install lsyncd and seems as a tool to synchronize my web server folders across multi data center: Asia, Europe and North America. It should be a multi-master setup and lsyncd with rysnc and SSH seems more promising give "Mirror" design seemed to be dual PC code sharing. However I want to be sure as of 2018 the lsyncd is able to handle multimaster with update=true and will it handle three or even more servers? We may have additional servers in Africa and Australia . Cheers |
@mason-chase IMO neither mirror nor lsyncd would be the best choice for production source code deployment. AFAIK that should never be two-way, e.g. it's not like you'll have someone manually editing the files on the Asia production server at the same time someone is manually editing files on the Europe production server, and you need to two-way sync their changes in real-time. Seems like you should have a single source of truth for your code (git, SCM, etc.) and do a one-way push from that to your production boxes, e.g. as a tar/gzip/etc./etc. |
Lsyncd has always been designed for a Master->(multiple) Slave setup. Given that a master-master setup kind of works with a an out of place /tmp file and update=true doesn't mean, it ever has been thought through if all race conditions are safe. Especially the case the same file gets edited on both ends, there is always a race condition, so that would have to be ensured in the surrounding setup. Having 3 or more servers as a web (everyone tangled with everyone) would need to have all system times to be synced exactly (NTP) and again, it was never designed to be that way. Yes, git might be a suitable tool for this, and no you don't need a single source of truth for git. But then you'd be pushing changes around, occasionally having to resolve conflicts manually. Or you go more serious with DRBD or GlusterFS |
Thanks a lot for your replies as for editing a single file, we would only user to edit and ftp a single master server but there might be files uploaded (added) and other production server based on user web application. Given Lsyncd and Mirror design I may have to try DRBD, I did try GlusterFS and it was only working in synchronous making multi-data-center setup almost impossible. I saw DRBD has Async option and hopefully that helps. Again thanks a lot. |
What I understood from your use case:
If that is true, IMO you need to have a synchronous syncing solution, since otherwise file conflicts can occur anytime. Or what I think that most cloud services that work similar to this do, is restricting "scratch areas" for user applications that are to be globally searched to a very specific system/structures. However since I'm not in that business or used that sort of thing anytime recently, I not really know what I'm talking about regarding this. |
@axkibe You are correct and that's very important notion you have mentioned about Synchronous, I think Synchronous with multiple session for changes is good idea but if they all end-up in same queue this can degraded speed. Our service is a geocasting that is different with load balancing, in a way it is load balancing but we are referring user to nearest server that has all web service files and database to serve customer. Database is synched with MySQL semi-synchronous. But parallel file editing via web application can occur in that case which I intend to override conflicting with most recent request, not sure if DRBD would offer this feature. Thanks a lot for your thought on this. |
I am getting data using some external connection to one file every 15mins to server one - for e.g.. /tmp/manas.log.09-dec-14
New file get created next day - e.g. /tmp/manas.log.10.dec-14
I am using lsyncd and enabled the sync for this directory all files from server one to server two
If the server one is down then this external connection will send the data to server two first - means
I am using lsyncd and enabled the sync for this directory from server two to server one with delete = false
overall i have enabled a bidirectional sync between these two servers
here is my concern
if data is coming to server one and this file has 100 rows from first 15mins
data will sync to server two and same file will get synced to server two with 100 rows
next 15mins - new data coming to server one - this has another 100 more lines
means server one will have now 200 rows, will server two overwrite this 200 rows to 100 lines because it is bidirectional sync before server one send this updated data to server two?
Please let me know who will win?
Let me know if you need any further information.
The text was updated successfully, but these errors were encountered: