Solr介紹:
Solr是一個獨立的企業級搜索應用服務器,Solr基於Lucene的全文搜索服務器,同時對其進行了擴展,提供了比Lucene更爲豐富的查詢語言,同時實現了可配置、可擴展並對查詢性能進行了優化,而且提供了一個完善的功能管理界面,是一款很是優秀的全文搜索引擎。Solr對外提供相似於Web-service的API接口。用戶能夠經過http請求,向搜索引擎服務器提交必定格式的XML文件/Json/文本等,生成索引;也能夠經過Http Get操做提出查找請求,並獲得Json格式的返回結果。
項目引入Solr時應該考慮的一些問題:
一、數據更新頻率:天天數據增量有多大,隨時更新仍是定時更新
二、數據總量:數據要保存多長時間
三、一致性要求:指望多長時間內看到更新的數據,最長容許多長時間延遲
四、數據特色:數據源包括哪些,平均單條記錄大小
五、業務特色:有哪些排序要求,檢索條件
六、資源複用:已有的硬件配置是怎樣的,是否有升級計劃
SolrCloud:Solr分佈式擴展方案
SolrCloud是基於ZooKeeper和Solr的分佈式解決方案,爲Solr添加分佈式功能,用於創建高可用,高伸縮,自動容錯,分佈式索引,分佈式查詢的Solr服務器集羣
它有幾個特點功能:
1)集中式的配置信息
2)自動容錯
3)近實時搜索
4)查詢時自動負載均衡
html
Solr 5.5.0 + tomcat 7.0.69 + zookeeper-3.4.6 Cloud部署java
(本文由於機器不夠,只能在單機環境僞分佈式部署,模擬4臺真實機器,真實部署可將tomcat/zookeeper的端口不作調整便可)node
4臺tomcat組成下述部署方案:Collection分紅兩個Shard分別存儲索引信息,每一個Shard又分紅兩個core_node(一主一備)來調配索引最終配置完成結果以下:web
(1)
apache-tomcat-7.0.69集羣配置:
版本:apache-tomcat-7.0.69
下載:http://tomcat.apache.org/download-70.cgi
位置:/var/local/
數量:4臺:/var/local/apache-tomcat-7.0.69-1,/var/local/apache-tomcat-7.0.69-2,/var/local/apache-tomcat-7.0.69-3,/var/local/apache-tomcat-7.0.69-4
說明:主要調整單機環境下tomcat的端口衝突
apache-tomcat-7.0.69-1
sudo vi /var/local/apache-tomcat-7.0.69-1/conf/server.xml
{
<Server port="18005" shutdown="SHUTDOWN">
<Connector port="18080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="18009" protocol="AJP/1.3" redirectPort="8443" />
}
apache-tomcat-7.0.69-2
{
sudo vi /var/local/apache-tomcat-7.0.69-2/conf/server.xml
<Server port="28005" shutdown="SHUTDOWN">
<Connector port="28080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="28009" protocol="AJP/1.3" redirectPort="8443" />
}
apache-tomcat-7.0.69-3
sudo vi /var/local/apache-tomcat-7.0.69-3/conf/server.xml
{
<Server port="38005" shutdown="SHUTDOWN">
<Connector port="38080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="38009" protocol="AJP/1.3" redirectPort="8443" />
}
apache-tomcat-7.0.69-4
sudo vi /var/local/apache-tomcat-7.0.69-4/conf/server.xml
{
<Server port="48005" shutdown="SHUTDOWN">
<Connector port="48080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="48009" protocol="AJP/1.3" redirectPort="8443" />
}
(2)
zookeeper-3.4.6集羣配置:
版本:zookeeper-3.4.6
下載:https://zookeeper.apache.org/releases.html#download
位置:/var/local/
數量:4臺:/var/local/zookeeper-3.4.6-1,/var/local/zookeeper-3.4.6-2,/var/local/zookeeper-3.4.6-3,/var/local/zookeeper-3.4.6-4
說明:主要調整單機環境下zookeeper的端口及目錄衝突
zookeeper-3.4.6-1
sudo mkdir -p /var/local/zookeeper-3.4.6-1/data
sudo mkdir -p /var/local/zookeeper-3.4.6-1/data/log
echo 1 > /var/local/zookeeper-3.4.6-1/data/myid
(sudo vi /var/local/zookeeper-3.4.6-1/data/myid
{1})
cd /var/local/zookeeper-3.4.6-1/conf/ &&sudo mv zoo_sample.cfg zoo.cfg &&sudo vi zoo.cfg
{
clientPort=2181
dataDir=/var/local/zookeeper-3.4.6-1/data
dataLogDir=/var/local/zookeeper-3.4.6-1/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
zookeeper-3.4.6-2
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-2
sudo vi /var/local/zookeeper-3.4.6-2/data/myid
{2}
sudo vi /var/local/zookeeper-3.4.6-2/conf/zoo.cfg
{
clientPort=2182
dataDir=/var/local/zookeeper-3.4.6-2/data
dataLogDir=/var/local/zookeeper-3.4.6-2/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
zookeeper-3.4.6-3
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-3
sudo vi /var/local/zookeeper-3.4.6-3/data/myid
{3}
sudo vi /var/local/zookeeper-3.4.6-3/conf/zoo.cfg
{
clientPort=2183
dataDir=/var/local/zookeeper-3.4.6-3/data
dataLogDir=/var/local/zookeeper-3.4.6-3/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
zookeeper-3.4.6-4
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-4
sudo vi /var/local/zookeeper-3.4.6-4/data/myid
{4}
sudo vi /var/local/zookeeper-3.4.6-4/conf/zoo.cfg
{
clientPort=2184
dataDir=/var/local/zookeeper-3.4.6-4/data
dataLogDir=/var/local/zookeeper-3.4.6-4/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
(3)
solr-5.5.0集羣配置:
版本:solr-5.5.0
下載:https://mirrors.tuna.tsinghua.edu.cn/apache/lucene/solr/5.5.0/
位置:/var/local/apache-tomcat-7.0.69-1~4
公有配置文件:/var/local/cloud_conf
數量:4臺:/var/local/apache-tomcat-7.0.69-1,/var/local/apache-tomcat-7.0.69-2,/var/local/apache-tomcat-7.0.69-3,/var/local/apache-tomcat-7.0.69-4
Solr WEB系統部署:
sudo cp -r ~/solr_cloud/solr-5.5.0/server/solr-webapp/webapp /var/local/apache-tomcat-7.0.69-1/webapps/solr
sudo cp -r ~/solr_cloud/solr-5.5.0/server/lib/ext/* /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/lib
(其餘須要用的jar包自行復制便可:~/solr_cloud/solr-5.5.0/dist/)
sudo cp -r ~/solr_cloud/solr-5.5.0/server/resources/log4j.properties /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/classes
(classes若不存在則手動創建)
solr-5.5.0-1
sudo mkdir -p /var/local/apache-tomcat-7.0.69-1/solr_home/
sudo cp ~/solr_cloud/solr-5.5.0/example/example-DIH/solr/solr.xml /var/local/apache-tomcat-7.0.69-1/solr_home/
sudo vi /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-1/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-1/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -Dbootstrap_confdir=/var/local/cloud_conf -Dcollection.configName=myconf -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184 -DnumShards=4"
}
sudo vi /var/local/apache-tomcat-7.0.69-1/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">18080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
solr-5.5.0-2
sudo mkdir -p /var/local/apache-tomcat-7.0.69-2/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-2/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-2/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-2/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-2/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-2/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}
sudo vi /var/local/apache-tomcat-7.0.69-2/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">28080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
solr-5.5.0-3
sudo mkdir -p /var/local/apache-tomcat-7.0.69-3/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-3/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-3/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-3/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-3/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-3/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}
sudo vi /var/local/apache-tomcat-7.0.69-3/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">38080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
solr-5.5.0-4
sudo mkdir -p /var/local/apache-tomcat-7.0.69-4/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-4/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-4/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-4/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-4/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-4/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}
sudo vi /var/local/apache-tomcat-7.0.69-4/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">48080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
(5)
啓動及說明:
啓動tomcat集羣
sudo /var/local/apache-tomcat-7.0.69-1/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-2/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-3/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-4/bin/startup.sh
關閉tomcat集羣
sudo /var/local/apache-tomcat-7.0.69-1/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-2/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-3/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-4/bin/shutdown.sh
啓動zookeeper集羣
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh restart && sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh restart &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh restart &&sudo /var/local/zookeeper-3.4.6-4/bin/zkServer.sh restart
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh stop && sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh stop &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh stop
查看zookeeper集羣狀態
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-4/bin/zkServer.sh status
訪問Solr Web系統:(192.168.5.48即本機IP)
http://192.168.5.48:18080/solr/index.html
http://192.168.5.48:28080/solr/index.html
http://192.168.5.48:38080/solr/index.html
http://192.168.5.48:48080/solr/index.html
上述地址都可訪問及管理Solr Web系統
說明:
1)solr-5.5.0中會出如今Collection中點擊query命令時,誤將地址欄 / 轉義爲 %2F 的bug,以下:
http://192.168.5.48:18080/solr/test1%2Fselect?indent=on&q=*:*&wt=jsonapache
2)solr-5.5.0中自帶的zookeeper-3.4.6.jar,所以建議zookeeper選用-3.4.6版本的json
3)solr-5以上的schema由managed-schema經過API管理,在solrconfig.xml中能夠查看到:
bootstrap
<!-- To disable dynamic schema REST APIs, use the following for <schemaFactory>:
<schemaFactory class="ClassicIndexSchemaFactory"/>
When ManagedIndexSchemaFactory is specified instead, Solr will load the schema from
the resource named in 'managedSchemaResourceName', rather than from schema.xml.
Note that the managed schema resource CANNOT be named schema.xml. If the managed
schema does not exist, Solr will create it after reading schema.xml, then rename
'schema.xml' to 'schema.xml.bak'.
Do NOT hand edit the managed schema - external modifications will be ignored and
overwritten as a result of schema modification REST API calls.
When ManagedIndexSchemaFactory is specified with mutable = true, schema
modification REST API calls will be allowed; otherwise, error responses will be
sent back for these requests.
-->
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
tomcat
4)schema部分說明:服務器
Stored:字段值會以保存一份原始內容在在索引中,能夠被搜索組件組件返回,考慮到性能問題,對於長文本就不適合存儲在索引中。
Indexed:表示字段會加被Sorl處理加入到索引中,只有被索引的字段才能被搜索到。
docValues: 表示此域是否須要添加一個docValues域,這對facet查詢,group分組,排序,function查詢有好處,能加快索引數據加載,對NRT近實時搜索比較友好,且更節省內存,但它也有一些限制,好比當前docValues域只支持strField,UUIDField,Trie*Field等域,且要求域的域值是單值不能是多值域
multiValued: 表示這個域是否能夠存儲多個值
omitNorms: 此屬性若設置爲true,即表示將忽略域值的長度標準化,忽略在索引過程當中對當前域的權重設置,且會節省內存。
只有全文本域或者你須要在索引建立過程當中設置域的權重時才須要把這個值設爲false,對於基本數據類型且不分詞的域如intFeild,longField,StrField等默認此屬性值就是true,不然默認就是false.
omitPositions=true|false若是設置,省略掉term vector中的地址信息
omitTermFreqAndPositions=true|false 若是設置,省略掉freq和term vector中的地址信息
termVectors: 設置爲true即表示須要爲該field存儲項向量信息,當你須要MoreLikeThis功能時,則須要將此屬性值設爲true,這樣會帶來一些性能提高。
termPositions: 是否存儲Term的起始位置信息,這會增大索引的體積,但高亮功能須要依賴此項設置,不然沒法高亮
termOffsets: 表示是否存儲索引的位置偏移量,高亮功能須要此項配置,當你使用SpanQuery時,此項配置會影響匹配的結果集
sortMissingLast表示若是域值爲null,在根據當前域進行排序時,把包含null值的document排在最後一位
sortMissingFirst:表示若是域值爲null,在根據當前域進行排序時,把包含null值的document排在前面一位
app