最近研究python,爬取大量數據,以前是用mysql5.6版本數據庫,由於數據量一大,總數據庫有2億左右,致使整個運行很是慢,在咱們老師帶領下,打算採用mongodb分佈式搭建系統構架。演示案例:https://www.68xi.cn/python
談談mongodb的集羣就要想到這個概念圖:mysql
從圖中能夠看到有四個組件:mongos、config server、shard、replica set。linux
mongos,數據庫集羣請求的入口,全部的請求都經過mongos進行協調,不須要在應用程序添加一個路由選擇器,mongos本身就是一個請求分發中心,它負責把對應的數據請求請求轉發到對應的shard服務器上。在生產環境一般有多mongos做爲請求的入口,防止其中一個掛掉全部的mongodb請求都沒有辦法操做。sql
config server,顧名思義爲配置服務器,存儲全部數據庫元信息(路由、分片)的配置。mongos自己沒有物理存儲分片服務器和數據路由信息,只是緩存在內存裏,配置服務器則實際存儲這些數據。mongos第一次啓動或者關掉重啓就會從 config server 加載配置信息,之後若是配置服務器信息變化會通知到全部的 mongos 更新本身的狀態,這樣 mongos 就能繼續準確路由。在生產環境一般有多個 config server 配置服務器,由於它存儲了分片路由的元數據,防止數據丟失!mongodb
shard,分片(sharding)是指將數據庫拆分,將其分散在不一樣的機器上的過程。將數據分散到不一樣的機器上,不須要功能強大的服務器就能夠存儲更多的數據和處理更大的負載。基本思想就是將集合切成小塊,這些塊分散到若干片裏,每一個片只負責總數據的一部分,最後經過一個均衡器來對各個分片進行均衡(數據遷移)。數據庫
replica set,中文翻譯副本集,其實就是shard的備份,防止shard掛掉以後數據丟失。複製提供了數據的冗餘備份,並在多個服務器上存儲數據副本,提升了數據的可用性, 並能夠保證數據的安全性。centos
仲裁者(Arbiter),是複製集中的一個MongoDB實例,它並不保存數據。仲裁節點使用最小的資源而且不要求硬件設備,不能將Arbiter部署在同一個數據集節點中,能夠部署在其餘應用服務器或者監視服務器中,也可部署在單獨的虛擬機中。爲了確保複製集中有奇數的投票成員(包括primary),須要添加仲裁節點作爲投票,不然primary不能運行時不會自動切換primary。緩存
簡單瞭解以後,咱們能夠這樣總結一下,應用請求mongos來操做mongodb的增刪改查,配置服務器存儲數據庫元信息,而且和mongos作同步,數據最終存入在shard(分片)上,爲了防止數據丟失同步在副本集中存儲了一份,仲裁在數據存儲到分片的時候決定存儲到哪一個節點。安全
系統系統 centos7服務器
三臺服務器:192.168.1.1/2/3
mongodb版本:3.2.8
服務器規劃
服務器1 | 服務器2 | 服務器3 |
---|---|---|
mongos | mongos | mongos |
config server | config server | config server |
shard server1 主節點 | shard server1 副節點 | shard server1 仲裁 |
shard server2 仲裁 | shard server2 主節點 | shard server2 副節點 |
shard server3 副節點 | shard server3 仲裁 | shard server3 主節點 |
端口分配:
選擇《OneinStack》數據庫安裝mongodb,注意版本爲3.2.8
cat >> /etc/rc.local << EOF echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag EOF
三個節點都執行,如:192.168.1.1
mkdir /usr/local/mongodb/data/{configsvr,mongos,shard1,shard2,shard3} mkdir /usr/local/mongodb/etc/keyfile >/usr/local/mongodb/etc/keyfile/linuxeye
修改:/usr/local/mongodb/etc/shard1.conf
systemLog: destination: file path: /usr/local/mongodb/log/shard1.log logAppend: true processManagement: fork: true pidFilePath: "/usr/local/mongodb/data/shard1/shard1.pid" net: port: 10001 storage: dbPath: "/usr/local/mongodb/data/shard1" engine: wiredTiger journal: enabled: true directoryPerDB: true operationProfiling: slowOpThresholdMs: 10 mode: "slowOp" #security: # keyFile: "/usr/local/mongodb/etc/keyfile/linuxeye" # clusterAuthMode: "keyFile" replication: oplogSizeMB: 50 replSetName: "shard1_linuxeye" secondaryIndexPrefetch: "all"
/usr/local/mongodb/etc/shard2.conf
systemLog: destination: file path: /usr/local/mongodb/log/shard2.log logAppend: true processManagement: fork: true pidFilePath: "/usr/local/mongodb/data/shard2/shard2.pid" net: port: 10002 storage: dbPath: "/usr/local/mongodb/data/shard2" engine: wiredTiger journal: enabled: true directoryPerDB: true operationProfiling: slowOpThresholdMs: 10 mode: "slowOp" #security: # keyFile: "/usr/local/mongodb/etc/keyfile/linuxeye" # clusterAuthMode: "keyFile" replication: oplogSizeMB: 50 replSetName: "shard2_linuxeye" secondaryIndexPrefetch: "all"
/usr/local/mongodb/etc/shard3.conf
systemLog: destination: file path: /usr/local/mongodb/log/shard3.log logAppend: true processManagement: fork: true pidFilePath: "/usr/local/mongodb/data/shard3/shard3.pid" net: port: 10003 storage: dbPath: "/usr/local/mongodb/data/shard3" engine: wiredTiger journal: enabled: true directoryPerDB: true operationProfiling: slowOpThresholdMs: 10 mode: "slowOp" #security: # keyFile: "/usr/local/mongodb/etc/keyfile/linuxeye" # clusterAuthMode: "keyFile" replication: oplogSizeMB: 50 replSetName: "shard3_linuxeye" secondaryIndexPrefetch: "all"
/usr/local/mongodb/etc/configsvr.conf
systemLog: destination: file path: /usr/local/mongodb/log/configsvr.log logAppend: true processManagement: fork: true pidFilePath: "/usr/local/mongodb/data/configsvr/configsvr.pid" net: port: 10004 storage: dbPath: "/usr/local/mongodb/data/configsvr" engine: wiredTiger journal: enabled: true #security: # keyFile: "/usr/local/mongodb/etc/keyfile/linuxeye" # clusterAuthMode: "keyFile" sharding: clusterRole: configsvr
/usr/local/mongodb/etc/mongos.conf
systemLog: destination: file path: /usr/local/mongodb/log/mongos.log logAppend: true processManagement: fork: true pidFilePath: /usr/local/mongodb/data/mongos/mongos.pid net: port: 27017 sharding: configDB: 192.168.1.1:10004,192.168.1.2:10004,192.168.1.3:10004 #security: # keyFile: "/usr/local/mongodb/etc/keyfile/linuxeye" # clusterAuthMode: "keyFile"
分別啓動mongo
/usr/local/mongodb/bin/mongod -f /usr/local/mongodb/etc/shard1.conf /usr/local/mongodb/bin/mongod -f /usr/local/mongodb/etc/shard2.conf /usr/local/mongodb/bin/mongod -f /usr/local/mongodb/etc/shard3.conf
配置複製集
/usr/local/mongodb/bin/mongo --port 10001 use admin config = { _id:"shard1_linuxeye", members:[ {_id:0,host:"192.168.1.1:10001"}, {_id:1,host:"192.168.1.2:10001",arbiterOnly:true}, {_id:2,host:"192.168.1.3:10001"} ] } rs.initiate(config)
/usr/local/mongodb/bin/mongo --port 10002 use admin config = { _id:"shard2_linuxeye", members:[ {_id:0,host:"192.168.1.1:10002"}, {_id:1,host:"192.168.1.2:10002"}, {_id:2,host:"192.168.1.3:10002",arbiterOnly:true} ] } rs.initiate(config)
/usr/local/mongodb/bin/mongo --port 10003 use admin config = { _id:"shard3_linuxeye", members:[ {_id:0,host:"192.168.1.1:10003",arbiterOnly:true}, {_id:1,host:"192.168.1.2:10003"}, {_id:2,host:"192.168.1.3:10003"} ] } rs.initiate(config)
注:以上是配置rs複製集,相關命令如:rs.status(),查看各個複製集的情況
啓動三臺機器上的configsvr和mongos節點
/usr/local/mongodb/bin/mongod -f /usr/local/mongodb/etc/configsvr.conf
再分別啓動
/usr/local/mongodb/bin/mongos -f /usr/local/mongodb/etc/mongos.conf
配置shard分片
在192.168.1.1機器上配置shard分片
/usr/local/mongodb/bin/mongo --port 27017 use admin db.runCommand({addshard:"shard1_linuxeye/192.168.1.1:10001,192.168.1.2:10001,192.168.1.3:10001"}); db.runCommand({addshard:"shard2_linuxeye/192.168.1.1:10002,192.168.1.2:10002,192.168.1.3:10002"}); db.runCommand({addshard:"shard3_linuxeye/192.168.1.1:10003,192.168.1.2:10003,192.168.1.3:10003"});
查看shard信息
mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("5a55af962f787566bce05b78") } shards: { "_id" : "shard1_linuxeye", "host" : "shard1_linuxeye/192.168.1.1:10001,192.168.1.2:10001" } { "_id" : "shard2_linuxeye", "host" : "shard2_linuxeye/192.168.1.1:10002,192.168.1.2:10002" } { "_id" : "shard3_linuxeye", "host" : "shard3_linuxeye/192.168.1.1:10003,192.168.1.2:10003" } active mongoses: "3.2.8" : 3 balancer: Currently enabled: yes Currently running: no Failed balancer rounds in last 5 attempts: 0 Migration Results for the last 24 hours: No recent migrations databases:
查看分片狀態
mongos> db.runCommand( {listshards : 1 } ) { "shards" : [ { "_id" : "shard1_linuxeye", "host" : "shard1_linuxeye/192.168.1.1:10001,192.168.1.2:10001" }, { "_id" : "shard2_linuxeye", "host" : "shard2_linuxeye/192.168.1.1:10002,192.168.1.2:10002" }, { "_id" : "shard3_linuxeye", "host" : "shard3_linuxeye/192.168.1.1:10003,192.168.1.2:10003" } ], "ok" : 1 }
啓用shard分片的庫名字爲'linuxeye',即爲庫
use admin mongos> sh.enableSharding("linuxeye") { "ok" : 1 } db.runCommand({"enablesharding":"linuxeye"})
表分片:
db.runCommand({shardcollection:'linuxeye.LiveAppMesssage',"key":{"_id":1}})
查看狀態
db.LiveAppMesssage.stats()