Replication ensures redundancy for your data, and enables you to send an update request to any node in the shard. If that node is a replica, it will forward the request to the leader, which then forwards it to all existing replicas, using versioning to make sure every replica has the most up-to-date version. This architecture enables you to be certain that your data can be recovered in the event of a disaster, even if you are using Near Real Time searching. node
索引複製確保爲你的數據提供了冗餘,而且你能夠把一個更新請求發送到shard裏面的任意一個節點。若是收到請求的節點是replica節點,它會把請求轉發給leader節點,而後leader節點會把這個請求轉發到全部存活的replica節點上去,他們經過使用版本控制來確保每一個replica節點的數據都是最新的版本。SolrCloud的這種結構讓數據可以在一個災難事故以後恢復,即使你正在使用的是一個近實時的搜索系統。 apache
If you want to use the NearRealtimeSearch support, enable auto soft commits in your solrconfig.xml file before storing it into Zookeeper. Otherwise you can send explicit soft commits to the cluster as you need. 服務器
若是你想要得到近實時搜索的支持,在solrconfig.xml放到ZooKeeper以前打開索引自動softCommit的特性。另外若是你須要的話能夠明確的發送一個softCommit請求給集羣。 架構
SolrCloud doesn't work very well with separated data clusters connected by an expensive pipe. The root problem is that SolrCloud's architecture sends documents to all the nodes in the cluster (on a per-shard basis), and that architecture is really dictated by the NRT functionality. app
若是你的數據分佈在一個節點之間傳輸數據代價很是高的集羣中,那麼SolrCloud可能不會運行的很好。其根本緣由是由於SolrCloud的架構會把文檔發送給集羣中的全部節點(會在每一個shard的節點之間發送),而這種架構其實是基於近實時功能的。 ide
Imagine that you have a set of servers in China and one in the US that are aware of each other. Assuming 5 replicas, a single update to a shard may make multiple trips over the expensive pipe before it's all done, probably slowing indexing speed unacceptably. ui
想象一下你有一系列的服務器是在放在中國,還有一些放在美國,而且它們都知道彼此的存在。假設有5個replica節點,一個發送給shard的單獨請求在完成以前可能在高代價的鏈接上傳輸屢次,極可能把索引速度拖慢到一個不可接受的程度。 this
So the SolrCloud recommendation for this situation is to maintain these clusters separately; nodes in China don't even know that nodes exist in the US and vice-versa. When indexing, you send the update request to one node in the US and one in China and all the node-routing after that is local to the separate clusters. Requests can go to any node in either country and maintain a consistent view of the data. spa
所以SolrCloud對這種狀況的建議是把這些集羣分開維護;放在中國的節點不用知道放在美國的節點的存在,反之亦然。當索引的時候,你把更新請求發送到一個放在美國的節點同時也發送到一個放在中國的節點,而後發送以後兩個分開的集羣之間的節點路由都是在各自集羣本地進行的。 版本控制
However, if your US cluster goes down, you have to re-synchronize the down cluster with up-to-date information from China. The process requires you to replicate the index from China to the repaired US installation and then get everything back up and working.
然而,若是你在美國的集羣宕機了,你必須將最新的數據相關信息從中國的機器上從新同步到美國的集羣中。這個處理須要你把索引從中國的機器上拷貝到美國的集羣中,而後備份好數據就能夠繼續正常工做了。
Use of Near Real Time (NRT) searching affects the way that systems using SolrCloud behave during disaster recovery.
使用近實時搜索會影響使用SolrCloud的系統在災難恢復時候的行爲方式。
The procedure outlined below assumes that you are maintaining separate clusters, as described above. Consider, for example, an event in which the US cluster goes down (say, because of a hurricane), but the China cluster is intact. Disaster recovery consists of creating the new system and letting the intact cluster create a replicate for each shard on it, then promoting those replicas to be leaders of the newly created US cluster.
下面所述的處理過程是假設你正在維護一個分開的集羣,跟上面所述的狀況同樣。考慮到以下這個例子,在美國的集羣出現了宕機事件(能夠說是由於一場颶風),可是在中國的集羣倒是無缺無損的。災難恢復由如下流程構成,首先建立一個新的系統而且讓完整的集羣在這個系統裏面爲每個shard都建立一個replica節點,而後把這些replica節點所有晉升爲新建立的美國集羣裏面的leader節點。
Here are the steps to take:
以下是須要進行的步驟:
SolrCloud will automatically use old-style replication for the bulk load. By temporarily having only one replica, you'll minimize data transfer across a slow connection.
全文完