網上mongodb的數據同步工具較少,前一段時間用monstache實現了mongo到es的數據實時同步。node
由於monstache是基於mongodb的oplog實現同步,而開啓oplog前提是配置mongo的複製集;git
開啓複製集可參考:https://blog.csdn.net/jack_brandy/article/details/88887795;github
接下來下載對應es,mongo版本的monstache https://rwynn.github.io/monstache-site/start/mongodb
在monstache目錄下建立config.toml配置文件,內容以下:數據庫
# connection settings # connect to MongoDB using the following URL mongo-url = "mongodb://localhost:27017" # connect to the Elasticsearch REST API at the following node URLs elasticsearch-urls = ["http://localhost:9200"] # frequently required settings # if you don't want to listen for changes to all collections in MongoDB but only a few # e.g. only listen for inserts, updates, deletes, and drops from mydb.mycollection # this setting does not initiate a copy, it is a filter on the oplog change listener only namespace-regex = '^aaa\.bbb$' #aaa表示mongodb的數據庫,bbb表示集合,表示要匹配的名字空間 # additionally, if you need to seed an index from a collection and not just listen for changes from the oplog # you can copy entire collections or views from MongoDB to Elasticsearch # direct-read-namespaces = ["mydb.mycollection", "db.collection", "test.test"] # if you want to use MongoDB change streams instead of legacy oplog tailing add the following # in this case you don't need regexes to filter collections. # change streams require MongoDB version 3.6+ # change streams can only be combined with resume, replay, or cluster-name options on MongoDB 4+ # if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment # to listen to an entire db use only the database name. For a deployment use an empty string. # change-stream-namespaces = ["mydb.mycollection", "db.collection", "test.test"] # additional settings # compress requests to Elasticsearch # gzip = true # generate indexing statistics # stats = true # index statistics into Elasticsearch # index-stats = true # use the following PEM file for connections to MongoDB # mongo-pem-file = "/path/to/mongoCert.pem" # disable PEM validation # mongo-validate-pem-file = false # use the following user name for Elasticsearch basic auth # elasticsearch-user = "someuser" # use the following password for Elasticsearch basic auth # elasticsearch-password = "somepassword" # use 4 go routines concurrently pushing documents to Elasticsearch # elasticsearch-max-conns = 4 # use the following PEM file to connections to Elasticsearch # elasticsearch-pem-file = "/path/to/elasticCert.pem" # validate connections to Elasticsearch # elastic-validate-pem-file = true # propogate dropped collections in MongoDB as index deletes in Elasticsearch dropped-collections = true # propogate dropped databases in MongoDB as index deletes in Elasticsearch dropped-databases = true # do not start processing at the beginning of the MongoDB oplog # if you set the replay to true you may see version conflict messages # in the log if you had synced previously. This just means that you are replaying old docs which are already # in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones. # replay = false # resume processing from a timestamp saved in a previous run resume = true #從上次同步的時間開始同步 # do not validate that progress timestamps have been saved # resume-write-unsafe = false # override the name under which resume state is saved # resume-name = "default" # exclude documents whose namespace matches the following pattern # namespace-exclude-regex = '^mydb\.ignorecollection$' # turn on indexing of GridFS file content # index-files = true # turn on search result highlighting of GridFS content # file-highlighting = true # index GridFS files inserted into the following collections # file-namespaces = ["users.fs.files"] # print detailed information including request traces # verbose = true # enable clustering mode cluster-name = 'tzg' #es集羣名 # do not exit after full-sync, rather continue tailing the oplog # exit-after-direct-reads = false
執行命令 ./monstache -f config.toml 成功的話會看到以下界面elasticsearch
若是es提示queue size不足,則再es配置中添加以下內容:ide
thread_pool: bulk: queue_size: 200