SolrCloud是被設計用來提供一個高可用性、可容錯的環境用來索引您的數據再進行搜索。在SolrCloud裏面,數據都被組織成多個「塊」或者叫作「shards」(分片),使數據可以存放在多臺物理機器上,而且使用replicas(複製塊)提供的冗餘來實現可伸縮性和容錯性,該系統使用一個Zookeeper服務來幫助管理整個集羣結構保證了全部的索引和搜索請求可以正確的被路由到不一樣的節點。 html
This section explains SolrCloud and its inner workings in detail, but before you dive in, it's best to have an idea of what it is you're trying to accomplish. This page provides a simple tutorial that explains how SolrCloud works on a practical level, and how to take advantage of its capabilities. We'll use simple examples of configuring SolrCloud on a single machine, which is obviously not a real production environment, which would include several servers or virtual machines. In a real production environment, you'll also use the real machine names instead of "localhost", which we've used here. java
本段詳細解釋了SolrCloud和它的內部工做的一些細節,但在你閱讀以前,你最好明白你想要經過閱讀本文了解到什麼。本頁提供一個簡單的指南用來講明SolrCloud是怎麼在一個實用的水平上工做的,而且學習怎麼利用它的一些功能。咱們將使用簡單的單機SolrCloud配置示例,很明顯這並非一個真實的生產環境,真實的生產環境一般會包含多臺物理服務器或是虛擬機。在生產環境中,你也須要把咱們在下面例子中使用的「localhost」替換成真實機器IP或機器名。 node
In this section you will learn: shell
在本段你將會學習到: apache
Creating a cluster with multiple shards involves two steps: bootstrap
建立一個帶有多個shard的SolrCloud集羣包含兩個步驟: windows
Make sure to run Solr from the example directory in non-SolrCloud mode at least once before beginning; this process unpacks the jar files necessary to run SolrCloud. However, do not load documents yet, just start it once and shut it down. 瀏覽器
請確保操做開始以前在非SolrCloud模式下至少運行一次example目錄下的Solr應用;這個操做可以解壓全部運行SolrCloud所必須的jar包文件。不要添加任何文檔,只須要啓動一次而後中止就能夠了。 服務器
In this example, you'll create two separate Solr instances on the same machine. This is not a production-ready installation, but just a quick exercise to get you familiar with SolrCloud. app
在這個例子中,你將在一臺機器上建立兩個獨立的Solr實例。生產環境不會使用這樣的模式,在這裏只是讓你可以快速的練習使用來熟悉SolrCloud。
For this exercise, we'll start by creating two copies of the example directory that is part of the Solr distribution:
爲了練習,咱們將經過建立Solr發佈包下面的example目錄的兩個拷貝開始:
cd <SOLR_DIST_HOME> cp -r example node1 cp -r example node2
These copies of the example directory can really be called anything. All we're trying to do is copy Solr's example app to the side so we can play with it and still have a stand-alone Solr example to work with later if we want.
這些example目錄的拷貝能夠被命名爲任意名字。咱們把Solr的example應用拷貝到另外一個地方是爲了咱們可以獨立使用它,假如咱們稍後想要再運行Solr的話咱們仍然保留了一個單獨的example目錄副本。
Next, start the first Solr instance, including the -DzkRun parameter, which also starts a local ZooKeeper instance:
下一步,啓動第一個Solr實例,在java啓動參數中加入-DzkRun 參數用來啓動一個本地的ZooKeeper實例:
cd node1 java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jarLet's look at each of these parameters:
咱們來看一下這些參數的含義:
-DzkRun Starts up a ZooKeeper server embedded within Solr. This server will manage the cluster configuration. Note that we're doing this example all on one machine; when you start working with a production system, you'll likely use multiple ZooKeepers in an ensemble (or at least a stand-alone ZooKeeper instance). In that case, you'll replace this parameter with zkHost=<ZooKeeper Host:Port>, which is the hostname:port of the stand-alone ZooKeeper.
-DzkRun 在Solr中啓動一個內嵌的ZooKeeper服務器。該服務會管理集羣的相關配置。須要注意的是咱們在作這個示例的時候都是在一個單獨的物理機器上完成的;當你在生產環境上運行的時候,你或許會在集羣中使用多個ZooKeeper示例(或者至少是一個單獨的ZooKeeper實例).在上述的狀況下,你能夠把這個參數給替換成zkHost=<ZooKeeper服務器IP:端口>,其實是那個單獨的ZooKeeper的"主機名:端口"這種形式
-DnumShards Determines how many pieces you're going to break your index into. In this case we're going to break the index into two pieces, or shards, so we're setting this value to 2. Note that once you start up a cluster, you cannot change this value. So if you expect to need more shards later on, build them into your configuration now (you can do this by starting all of your shards on the same server, then migrating them to different servers later).
-DnumShards 該參數肯定了你要把你的數據分開到多少個shard中。在這個例子中,咱們將會把數據分割成兩個分塊,或者稱爲shard,因此咱們把這個值設置爲2。注意,在你第一次啓動你的集羣以後,這個值就不能再改變了,因此若是你預計你的索引在往後可能會須要更多的shard的話,你如今就應該把他們規劃到到你的配置參數中(你能夠在同一個服務上啓動全部的shard,在往後再把它們遷移到不一樣的Solr實例上去。)
-Dbootstrap_confdir ZooKeeper needs to get a copy of the cluster configuration, so this parameter tells it where to find that information.
-Dbootstrap_confdir ZooKeeper須要準備一份集羣配置的副本,因此這個參數是告訴SolrCloud這些配置是放在哪裏。
-Dcollection.configName This parameter determines the name under which that configuration information is stored by ZooKeeper. We've used "myconf" as an example, it can be anything you'd like.
-Dcollection.configName 這個參數肯定了保存在ZooKeeper中的索引配置的名字。在這裏咱們使用了「myconf」做爲一個例子,你可使用任意你想要使用的名字來替換。
The -DnumShards, -Dbootstrap_confdir, and -Dcollection.configName parameters need only be specified once, the first time you start Solr in SolrCloud mode. They load your configurations into ZooKeeper; if you run them again at a later time, they will re-load your configurations and may wipe out changes you have made.
-DnumShards, -Dbootstrap_confdir和-Dcollection.configName參數只須要在第一次將Solr運行在SolrCloud模式的時候聲明一次。它們能夠把你的配置加載到ZooKeeper中;若是你在往後從新聲明瞭這些參數從新運行了一次,將會從新加載你的配置,這樣你在原來配置上所作的一些修改操做可能會被覆蓋。
At this point you have one sever running, but it represents only half the shards, so you will need to start the second one before you have a fully functional cluster. To do that, start the second instance in another window as follows:
這時候你已經有一個Solr服務在運行了,可是它僅僅表明的是你所有shard的一半,因此在你組建成一個擁有完整功能的Solr集羣以前還須要啓動第二個Solr服務。接下來,在新的窗口中啓動第二個實例,以下:
cd node2 java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
由於該節點不是一個overseer節點,因此它的參數就稍微沒有那麼複雜了:
-Djetty.port The only reason we even have to set this parameter is because we're running both servers on the same machine, so they can't both use Jetty's default port. In this case we're choosing an arbitrary number that's different from the default. When you start on different machines, you can use the same Jetty ports if you'd like.
-Djetty.port 咱們須要設置這個參數的緣由只有一個,那就是咱們在一臺機器上運行了兩個Solr實例,因此它們不能全用jetty的默認端口做爲http監聽端口。在這個例子中,咱們使用了一個和默認端口號不相同的任意端口號。當你在不一樣的機器上啓動Solr集羣的時候,你可使用任意的端口號做爲jetty的監聽端口號。
-DzkHost This parameter tells Solr where to find the ZooKeeper server so that it can "report for duty". By default, the ZooKeeper server operates on the Solr port plus 1000. (Note that if you were running an external ZooKeeper server, you'd simply point to that.)
-DzkHost 這個參數告訴Solr去哪裏找ZooKeeper服務來「報到」。默認狀況下,ZooKeeper服務的端口號是在Solr的端口號上加上1000。(注意:若是你運行了一個外部的ZooKeeper服務的話,你只須要簡單的將這個參數值指向ZooKeeper服務的地址)
At this point you should have two Solr windows running, both being managed by ZooKeeper. To verify that, open the Solr Admin UI in your browser and go to theCloud screen:
如今你已經有兩個Solr實例在運行了,它們都在同一個ZooKeeper的管理之下。爲了驗證一下,在你的瀏覽器中打開Solr Admin UI而後跳轉到Cloud screen界面:
http://localhost:8983/solr/#/~cloud
Use the port of the first Solr you started; this is your overseer. You can go to the
You should see both node1 and node2, as in:
使用你啓動的第一個Solr的端口號,它是你的SolrCloud集羣的overseer節點。你能夠在這個界面中同時看到節點1和節點2,如圖所示:
Now it's time to see the cluster in action. Start by indexing some data to one or both shards. You can do this any way you like, but the easiest way is to use theexampledocs, along with curl so that you can control which port (and thereby which server) gets the updates:
如今是時候看看集羣的運轉狀況了。開始索引若干數據到單個shard或者所有的兩個shard。你能夠用任意你喜歡的方式來提交數據,可是最簡單的方式是使用example裏面的示例文檔數據連同curl一塊兒使用,這樣你就能夠本身選擇哪一個端口(就是選擇哪一個Solr服務)來處理這些更新請求:
curl http://localhost:8983/solr/update?commit=true -H "Content-Type: text/xml" -d "@mem.xml" curl http://localhost:7574/solr/update?commit=true -H "Content-Type: text/xml" -d "@monitor2.xml"
到這個時候每一個shard都包含了所有文檔中的一部分數據,任意一個到達Solr的搜索請求最終都會跨越兩個shard處理。舉個例子,下面的這兩個搜索操做都會返回一樣的結果集:
http://localhost:8983/solr/collection1/select?q=*:*
http://localhost:7574/solr/collection1/select?q=*:*
The reason that this works is that each shard knows about the other shards, so the search is carried out on all cores, then the results are combined and returned by the called server.
出現這樣結果的緣由是由於每一個shard都知道集羣中其餘的shard的狀況,因此搜索操做都會在全部的core上執行,最終結果是由最初接受請求的Solr合併而且返回給調用者的。
In this way you can have two cores or two hundred, with each containing a separate portion of the data.
經過這種方式你能夠獲得兩個solr core,每一個core都包含了整個數據集的一部分。
If you want to check the number of documents on each shard, you could add distrib=false to each query and your search would not span all shards.
若是你想要檢查每一個shard上面的文檔數量,你能夠在每一個查詢請求上添加一個參數 distrib=false,這樣搜索操做就不會跨越每個shard操做了。
But what about providing high availability, even if one of these servers goes down? To do that, you'll need to look at replicas.
可是在這些Slor服務中若是有一臺服務宕機的狀況下如何繼續提供高可用性呢?爲了實現這個目標,你須要瞭解一下Replica節點
In order to provide high availability, you can create replicas, or copies of each shard that run in parallel with the main core for that shard. The architecture consists of the original shards, which are called the leaders, and their replicas, which contain the same data but let the leader handle all of the administrative tasks such as making sure data goes to all of the places it should go. This way, if one copy of the shard goes down, the data is still available and the cluster can continue to function.
爲了提供高可用性,你能夠建立Replica,或者能夠稱爲和該Shard的主core並行運行的一個副本。這個結構由原始的Shard和它的一些副本構成,咱們把原始的那個Shard叫作Leader,用來處理全部的管理相關的請求,例如用來確保數據去往它應該去的地方,而Replica僅僅是包含了和Leader如出一轍的數據的一個副本。經過這種方式,假如該Shard的一個副本宕機了,那麼整個Shard的數據仍然是可用的,而且集羣仍然可用繼續正常提供服務。
Start by creating two more fresh copies of the example directory:
首先建立原始example目錄的兩個新副本:
cd <SOLR_DIST_HOME> cp -r example node3 cp -r example node4
就像咱們第一次建立兩個shard同樣,你能夠給這些拷貝的目錄命名爲任意的名字。
If you don't already have the two instances you created in the previous section up and running, go ahead and restart them. From there, it's simply a matter of adding additional instances. Start by adding node3:
若是在上一段中你已經建立好的兩個實例尚未啓動並正常運行,先重啓它們。而後,添加額外的SolrCloud實例就是一件簡單的事情了。開始添加節點3:
cd node3 java -Djetty.port=8900 -DzkHost=localhost:9983 -jar start.jar
須要注意的是這些參數偏偏是和咱們啓動的第二個節點的參數是同樣的;簡單的將你的新實例指向原來的ZooKeeper便可。這時候你看一下SolrCloud的管理頁面,你就會發現新添加的節點並無成爲第三個Shard,而是第一個Shard的一個Replica(副本)
This is because the cluster already knew that there were only two shards and they were already accounted for, so new nodes are added as replicas. Similarly, when you add the fourth instance, it's added as a replica for the second shard:
這是由於集羣已經知道它負責維護的Shard只有兩個而且都已經存在了,因此新添加的節點只可以做爲Replica存在了。一樣道理,當你添加第四個實例的時候,它會被當作第二個Shard的Replica添加進集羣:
cd node4 java -Djetty.port=7500 -DzkHost=localhost:9983 -jar start.jar
If you were to add additional instances, the cluster would continue this round-robin, adding replicas as necessary. Replicas are attached to leaders in the order in which they are started, unless they are assigned to a specific shard with an additional parameter of shardId (as a system property, as in -DshardId=1, the value of which is the ID number of the shard the new node should be attached to). Upon restarts, the node will still be attached to the same leader even if the shardId is not defined again (it will always be attached to that machine).
若是你繼續添加額外的實例,集羣將會繼續進行重複上述的操做,必要時把它們當作Replica添加進去。每當Replica啓動後它們會自動有序的附加到Leader節點上,除非它們經過一個額外的shardId參數(這個參數聲明爲system property參數,例如 -DshardId=1,這個參數的值就是新的節點想要附加的Shard的id值)來分配到指定的Shard。若是重啓了某一個節點,它仍然會被附加到原來的Leader上(不管如何,它總會附加到同一臺機器上),前提是shardId沒有從新定義。
So where are we now? You now have four servers to handle your data. If you were to send data to a replica, as in:
那咱們如今處在一種什麼狀況下?咱們已經有了4個Solr服務來處理你的數據。若是你想要發送數據到replica節點的話,執行:
curl http://localhost:7500/solr/update?commit=true -H "Content-Type: text/xml" -d "@money.xml"
執行的狀況可能跟下面描述的類似:
In this way, the data is available via a request to any of the running instances, as you can see by requests to:
經過這種方式,請求到集羣中任意一個實例的請求數據都可以獲得有效處理,你能夠經過下面的請求觀察:
http://localhost:8983/solr/collection1/select?q=*:*
http://localhost:7574/solr/collection1/select?q=*:*
http://localhost:8900/solr/collection1/select?q=*:*
http://localhost:7500/solr/collection1/select?q=*:*
But how does this help provide high availability? Simply put, a cluster must have at least one server running for each shard in order to function. To test this, shut down the server on port 7574, and then check the other servers:
可是這種方式是怎麼幫助咱們提供高可用性的呢?簡單的說,若是一個集羣想要正常運行則每個Shard都必須至少有一個節點是正常運行的。爲了測試一下這種狀況,關閉端口爲7574的節點,而後檢查其餘節點:
http://localhost:8983/solr/collection1/select?q=*:*
http://localhost:8900/solr/collection1/select?q=*:*
http://localhost:7500/solr/collection1/select?q=*:*
You should continue to see the full set of data, even though one of the servers is missing. In fact, you can have multiple servers down, and as long as at least one instance for each shard is running, the cluster will continue to function. If the leader goes down – as in this example – a new leader will be "elected" from among the remaining replicas.
接下來你應該能夠看到完整的數據,即使是一個節點已經掛掉。事實上,你能夠停掉多個節點,只要某一個shard有一個節點處於正常運行狀態,整個集羣均可以正常運行。像本例中這樣,若是Leader節點宕機了,一個新的Leader將會從剩下的Replica節點中被選舉出來。
Note that when we talk about servers going down, in this example it's crucial that one particular server stays up, and that's the one running on port 8983. That's because it's our overseer – the instance running ZooKeeper. If that goes down, the cluster can continue to function under some circumstances, but it won't be able to adapt to any servers that come up or go down.
須要注意的是本例中所說並非全部的節點均可以隨時宕機,在本例中咱們有一個特殊的重要節點須要一直保持正常運行,就是那個運行在端口8983上的節點。這是由於它是咱們的overseer節點,這個節點實例中運行了維護集羣狀態的ZooKeeper。若是這個節點也掛掉了的話,整個集羣在某些狀況下可能仍然可以正常運轉,可是卻不能處理任何節點新增或者宕機的狀況。
That kind of single point of failure is obviously unacceptable. Fortunately, there is a solution for this problem: multiple ZooKeepers.
這種單點故障的狀況顯然是沒法接受的。幸運的是,有一個方法能夠解決這個問題:多ZooKeeper集羣。
To simplify setup for this example we're using the internal ZooKeeper server that comes with Solr, but in a production environment, you will likely be using an external ZooKeeper. The concepts are the same, however. You can find instructions on setting up an external ZooKeeper server here:http://zookeeper.apache.org/doc/r3.3.4/zookeeperStarted.html
爲了使本例可以簡單的安裝使用,咱們使用了Solr中內置的ZooKeeper服務,可是在生產環境上,你極可能會使用一個外部的ZooKeeper服務。它們的功能是同樣的。你能夠在下面這個網址上面找到怎麼單間一個外部的ZooKeeper服務的介紹:http://zookeeper.apache.org/doc/r3.3.4/zookeeperStarted.html
To truly provide high availability, we need to make sure that not only do we also have at least one shard server running at all times, but also that the cluster also has a ZooKeeper running to manage it. To do that, you can set up a cluster to use multiple ZooKeepers. This is called using a ZooKeeper ensemble.
爲了提供真正的高可用性,咱們須要確保的不只僅只是至少有一個Shard節點一直在運行,咱們還要確保整個集羣至少有一個ZooKeeper服務來管理這個集羣。爲了達到這個目的,你可使用多個ZooKeeper服務器來搭建集羣。也能夠叫作使用ZooKeeper ensemble集羣。
A ZooKeeper ensemble can keep running as long as more than half of its servers are up and running, so at least two servers in a three ZooKeeper ensemble, 3 servers in a 5 server ensemble, and so on, must be running at any given time. These required servers are called a quorum.
一個ZooKeeper ensemble可以在集羣中有一半以上的節點存活的時候正常運行,因此在3個ZooKeeper ensemble中咱們須要至少兩個節點正常運行,5個的話須要3個節點正常,以此類推,必須在任意特定的時間都須要保持正常運行。這個在一個集羣中必須同時正常運行的節點數叫作quorum。
In this example, you're going to set up the same two-shard cluster you were using before, but instead of a single ZooKeeper, you'll run a ZooKeeper server on three of the instances. Start by cleaning up any ZooKeeper data from the previous example:
在本例中,你將會創建一個和以前如出一轍的帶有兩個Shard的集羣,可是不是使用一個單獨的ZooKeeper,ZooKeeper會在這些實例中的三個上面運行。先清除一下上一個例子中所產生的全部ZooKeeper數據:
cd <SOLR_DIST_DIR> rm -r node*/solr/zoo_data
接下來重啓全部的Solr節點,可是此次,每一個節點都會運行ZooKeeper並且監聽ZooKeeper ensemble中的剩下節點以執行相關指令,而不是將他們簡單的指向一個單獨的ZooKeeper實例。
You're using the same ports as before – 8983, 7574, 8900 and 7500 – so any ZooKeeper instances would run on ports 9983, 8574, 9900 and 8500. You don't actually need to run ZooKeeper on every single instance, however, so assuming you run ZooKeeper on 9983, 8574, and 9900, the ensemble would have an address of:
你可使用和以前例子中使用的相同的端口號——8983,7574,8900 和7500,這樣的話全部ZooKeeper實例將會分別運行在端口9983,8574,9900和8500上。事實上你並不須要在每個實例上面都運行一個ZooKeeper服務,可是若是僅僅在993,8574和9900上運行zookeeper服務的話,ZooKeeper ensemble會擁有以下地址:
localhost:9983,localhost:8574,localhost:9900
This means that when you start the first instance, you'll do it like this:
這意味着你啓動第一個實例的時候,你將這樣作:
cd node1 java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
Note that the order of the parameters matters. Make sure to specify the -DzkHost parameter after the other ZooKeeper-related parameters.
注意相關參數聲明的順序問題,確保把-DzkHost聲明在全部其餘和ZooKeeper相關的參數以前。
You'll notice a lot of error messages scrolling past; this is because the ensemble doesn't yet have a quorum of ZooKeepers running.
你將會看到大量的錯誤信息滾動過屏幕;這是由於這個ZooKeeper ensemble中運行的節點數尚未達到須要的數量。
Notice also, that this step takes care of uploading the cluster's configuration information to ZooKeeper, so starting the next server is more straightforward:
一樣須要注意的是,這個步驟上傳了集羣的相關配置信息到ZooKeeper中去,因此啓動下一個節點就變得略簡單了:
cd node2 java -Djetty.port=7574 -DzkRun -DnumShards=2 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
Once you start this instance, you should see the errors begin to disappear on both instances, as the ZooKeepers begin to update each other, even though you only have two of the three ZooKeepers in the ensemble running.
一旦你啓動了這個實例,你應該看到錯誤在兩個實例中都開始消失了,由於ZooKeeper開始更新了彼此的信息,即便在你的三個節點組成的ZooKeeper ensemble中只有兩個節點在運行。
Next start the last ZooKeeper:
接下來啓動最後一個ZooKeeper:
cd node3 java -Djetty.port=8900 -DzkRun -DnumShards=2 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
Finally, start the last replica, which doesn't itself run ZooKeeper, but references the ensemble:
最後,啓動最終的Replica節點,這個節點自己不運行ZooKeeper,可是會引用到已經啓動的ZooKeeper ensemble。
cd node4 java -Djetty.port=7500 -DzkHost=localhost:9983,localhost:8574,localhost:9900 -jar start.jar
爲了確保全部環節都正常的工做了,執行一個查詢請求:
http://localhost:8983/solr/collection1/select?q=*:*
and check the SolrCloud admin page:
而後檢查SolrCloud管理界面:
Now you can go ahead and kill the server on 8983, but ZooKeeper will still work, because you have more than half of the original servers still running. To verify, open the SolrCloud admin page on another server, such as:
如今你能夠返回前面的目錄而後關掉在端口8983上面的節點,可是ZooKeeper將仍然正常運行,由於你擁有超過原始ZooKeeper節點數一半的節點都在正常運行。爲了確保這一點,打開其餘節點的SolrCloud管理頁面,以下所示:
http://localhost:8900/solr/#/~cloud
第一章 完