乾貨:monstache同步mongodb數據至elasticsearch,實現數據實時同步

網上mongodb的數據同步工具較少,前一段時間用monstache實現了mongo到es的數據實時同步。node

由於monstache是基於mongodb的oplog實現同步,而開啓oplog前提是配置mongo的複製集;git

開啓複製集可參考:https://blog.csdn.net/jack_brandy/article/details/88887795github

接下來下載對應es,mongo版本的monstache https://rwynn.github.io/monstache-site/start/mongodb

在monstache目錄下建立config.toml配置文件,內容以下:數據庫

# connection settings
# connect to MongoDB using the following URL
mongo-url = "mongodb://localhost:27017"

# connect to the Elasticsearch REST API at the following node URLs
elasticsearch-urls = ["http://localhost:9200"]

# frequently required settings
# if you don't want to listen for changes to all collections in MongoDB but only a few
# e.g. only listen for inserts, updates, deletes, and drops from mydb.mycollection
# this setting does not initiate a copy, it is a filter on the oplog change listener only
namespace-regex = '^aaa\.bbb$'      #aaa表示mongodb的數據庫,bbb表示集合,表示要匹配的名字空間

# additionally, if you need to seed an index from a collection and not just listen for changes from the oplog
# you can copy entire collections or views from MongoDB to Elasticsearch
# direct-read-namespaces = ["mydb.mycollection", "db.collection", "test.test"]

# if you want to use MongoDB change streams instead of legacy oplog tailing add the following
# in this case you don't need regexes to filter collections.
# change streams require MongoDB version 3.6+
# change streams can only be combined with resume, replay, or cluster-name options on MongoDB 4+
# if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment
# to listen to an entire db use only the database name.  For a deployment use an empty string.
# change-stream-namespaces = ["mydb.mycollection", "db.collection", "test.test"]

# additional settings
# compress requests to Elasticsearch
# gzip = true
# generate indexing statistics
# stats = true
# index statistics into Elasticsearch
# index-stats = true
# use the following PEM file for connections to MongoDB
# mongo-pem-file = "/path/to/mongoCert.pem"
# disable PEM validation
# mongo-validate-pem-file = false
# use the following user name for Elasticsearch basic auth
# elasticsearch-user = "someuser"
# use the following password for Elasticsearch basic auth
# elasticsearch-password = "somepassword"
# use 4 go routines concurrently pushing documents to Elasticsearch
# elasticsearch-max-conns = 4 
# use the following PEM file to connections to Elasticsearch
# elasticsearch-pem-file = "/path/to/elasticCert.pem"
# validate connections to Elasticsearch
# elastic-validate-pem-file = true
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = true
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = true
# do not start processing at the beginning of the MongoDB oplog
# if you set the replay to true you may see version conflict messages
# in the log if you had synced previously. This just means that you are replaying old docs which are already
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
# replay = false
# resume processing from a timestamp saved in a previous run
resume = true #從上次同步的時間開始同步
# do not validate that progress timestamps have been saved
# resume-write-unsafe = false
# override the name under which resume state is saved
# resume-name = "default"
# exclude documents whose namespace matches the following pattern
# namespace-exclude-regex = '^mydb\.ignorecollection$'
# turn on indexing of GridFS file content
# index-files = true
# turn on search result highlighting of GridFS content
# file-highlighting = true
# index GridFS files inserted into the following collections
# file-namespaces = ["users.fs.files"]
# print detailed information including request traces
# verbose = true
# enable clustering mode
 cluster-name = 'tzg'  #es集羣名
# do not exit after full-sync, rather continue tailing the oplog
# exit-after-direct-reads = false

執行命令 ./monstache -f config.toml 成功的話會看到以下界面elasticsearch

若是es提示queue size不足,則再es配置中添加以下內容:ide

thread_pool: 
    bulk: 
      queue_size: 200
相關文章
相關標籤/搜索