從https://github.com/chenlb/mmseg4j-solr/releases能夠找到最新的版本(2.4.0),而且下面指出了其能夠適配那些版本的solr(我選擇6.3.0)html
從http://archive.apache.org/dist/lucene/solr/6.3.0/下載,windows選擇solr-6.3.0.zip linux選擇solr-6.3.0.tgzjava
執行以下命令:node
tar -xf solr-6.3.0.tgzlinux
cd solr-6.3.0/bingit
./solr startgithub
打開瀏覽器訪問控制檯數據庫
中止solrapache
./solr stopjson
啓動雲:./solr start -e cloudwindows
單獨啓動某一個node:./solr start -c
中止雲:./solr stop -all
單獨中止某一個node:./solr stop -p 8983
對啓動和中止命令詳情可使用./solr start -help ./solr stop -help
下面是啓動colr雲 example的交互界面,能夠直接所有按enter鍵,直到出現成功提示。整個交互過程是說: 建立一個集羣,集羣包含兩個接口,新建了一個鏈接gettingstarted,在兩個節點之間爲該鏈接作兩個分片,每一個分片有兩個拷貝(故障恢復使用的)
[jionsvolk@localhost /home/jionsvolk/proc/solr-6.3.0/bin]$ ./bin/solr start -e cloud -bash: ./bin/solr: No such file or directory [jionsvolk@localhost /home/jionsvolk/proc/solr-6.3.0/bin]$ ./solr start -e cloud Welcome to the SolrCloud example! This interactive session will help you launch a SolrCloud cluster on your local workstation. To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]: Ok, let's start up 2 Solr nodes for your example SolrCloud cluster. Please enter the port for node1 [8983]: Please enter the port for node2 [7574]: Creating Solr home directory /home/jionsvolk/proc/solr-6.3.0/example/cloud/node1/solr Cloning /home/jionsvolk/proc/solr-6.3.0/example/cloud/node1 into /home/jionsvolk/proc/solr-6.3.0/example/cloud/node2 Starting up Solr on port 8983 using command: /home/jionsvolk/proc/solr-6.3.0/bin/solr start -cloud -p 8983 -s "/home/jionsvolk/proc/solr-6.3.0/example/cloud/node1/solr" Waiting up to 180 seconds to see Solr running on port 8983 [\] Started Solr server on port 8983 (pid=4329). Happy searching! Starting up Solr on port 7574 using command: /home/jionsvolk/proc/solr-6.3.0/bin/solr start -cloud -p 7574 -s "/home/jionsvolk/proc/solr-6.3.0/example/cloud/node2/solr" -z localhost:9983 Waiting up to 180 seconds to see Solr running on port 7574 [\] Started Solr server on port 7574 (pid=4529). Happy searching! Now let's create a new collection for indexing documents in your 2-node cluster. Please provide a name for your new collection: [gettingstarted] How many shards would you like to split gettingstarted into? [2] How many replicas per shard would you like to create? [2] Please choose a configuration for the gettingstarted collection, available options are: basic_configs, data_driven_schema_configs, or sample_techproducts_configs [data_driven_schema_configs] Connecting to ZooKeeper at localhost:9983 ... Uploading /home/jionsvolk/proc/solr-6.3.0/server/solr/configsets/data_driven_schema_configs/conf for config gettingstarted to ZooKeeper at localhost:9983 Creating new collection 'gettingstarted' using command: http://localhost:8983/solr/admin/collections?action=CREATE&name=gettingstarted&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=gettingstarted { "responseHeader":{ "status":0, "QTime":8934}, "success":{ "192.168.245.128:7574_solr":{ "responseHeader":{ "status":0, "QTime":7130}, "core":"gettingstarted_shard1_replica1"}, "192.168.245.128:8983_solr":{ "responseHeader":{ "status":0, "QTime":7604}, "core":"gettingstarted_shard1_replica2"}}} Enabling auto soft-commits with maxTime 3 secs using the Config API POSTing request to Config API: http://localhost:8983/solr/gettingstarted/config {"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}} Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000 SolrCloud example running, please visit: http://localhost:8983/solr
使用./solr stop -all以後,若是要重啓雲還有點麻煩,須要一步步的操做,並且要記錄下以前的端口,還得知道zookeeper的默認端口規則
./solr start -c -p 8983 -s ../example/cloud/node1/solr
./solr start -c -p 7574 -s ../example/cloud/node2/solr -z localhost:9983
zookeeper的默認端口爲第一個接口的端口號+1000,如:9983=8983+1000
在solr安裝目錄下的bin目錄中執行命令:
./post -c gettingstarted ../example/exampledocs/*
gettingstarted是鏈接名字,必須和上面建立的鏈接名一致
執行完命令之後會打印一堆信息。。。。
/home/jionsvolk/proc/jdk1.8.0_65/bin/java -classpath /home/jionsvolk/proc/solr-6.3.0/dist/solr-core-6.3.0.jar -Dauto=yes -Dc=gettingstarted -Ddata=files org.apache.solr.util.SimplePostTool ../example/exampledocs/books.csv ../example/exampledocs/books.json ../example/exampledocs/gb18030-example.xml ../example/exampledocs/hd.xml ../example/exampledocs/ipod_other.xml ../example/exampledocs/ipod_video.xml ../example/exampledocs/manufacturers.xml ../example/exampledocs/mem.xml ../example/exampledocs/money.xml ../example/exampledocs/monitor2.xml ../example/exampledocs/monitor.xml ../example/exampledocs/more_books.jsonl ../example/exampledocs/mp500.xml ../example/exampledocs/post.jar ../example/exampledocs/sample.html ../example/exampledocs/sd500.xml ../example/exampledocs/solr-word.pdf ../example/exampledocs/solr.xml ../example/exampledocs/test_utf8.sh ../example/exampledocs/utf8-example.xml ../example/exampledocs/vidcard.xml SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/gettingstarted/update... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file books.csv (text/csv) to [base] POSTing file books.json (application/json) to [base]/json/docs POSTing file gb18030-example.xml (application/xml) to [base] POSTing file hd.xml (application/xml) to [base] POSTing file ipod_other.xml (application/xml) to [base] POSTing file ipod_video.xml (application/xml) to [base] POSTing file manufacturers.xml (application/xml) to [base] POSTing file mem.xml (application/xml) to [base] POSTing file money.xml (application/xml) to [base] POSTing file monitor2.xml (application/xml) to [base] POSTing file monitor.xml (application/xml) to [base] POSTing file more_books.jsonl (application/json) to [base]/json/docs POSTing file mp500.xml (application/xml) to [base] POSTing file post.jar (application/octet-stream) to [base]/extract POSTing file sample.html (text/html) to [base]/extract POSTing file sd500.xml (application/xml) to [base] POSTing file solr-word.pdf (application/pdf) to [base]/extract POSTing file solr.xml (application/xml) to [base] POSTing file test_utf8.sh (application/octet-stream) to [base]/extract POSTing file utf8-example.xml (application/xml) to [base] POSTing file vidcard.xml (application/xml) to [base] 21 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update... Time spent: 0:00:10.575
### 1)在控制檯訪問:
默認不修改任何條件,以下圖
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=*:*&wt=json http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=*:*&wt=xml
經過這兩種方式其餘應用就可使用httpclient訪問solr服務器得到查詢結果
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=author:Glen%20Cook&wt=json http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=\%Glen+20Cook\%22&wt=json 在代碼時,若是第一種不行就換第二種
%20:空格的轉移符
%22:英文雙引號的轉移符
意思爲:在gettingstarted鏈接中查詢字段爲author,值含有Glen Cook的記錄,返回結果是json格式
還可使用*通配符
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=author:*Al*&wt=json
查詢的就是全部author字段中包含Al字符串的記錄
若是不指定字段名稱,那麼會檢索全部字段和查詢條件作匹配,而後返回結果,這種模式和搜索引擎就很像了
solr默認返回全部字段,也能夠指定返回的字段,例如:
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=author:Glen*&wt=json&fl=id
不限定字段組合查詢
在查詢條件中增長關鍵字或者短語,在其前面加"+",轉義後是"%2B",如:
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=%2Bbook%20%2BJhereg&wt=json
意思爲查詢某個字段值含有book和某個字段值含有Jhereg的記錄
在查詢條件中排出不須要的關鍵字或者短語,在其前面加"-",如
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=%2Bbook%20-Jhereg&wt=json
意思爲查詢某個字段值含有book和但全部字段值不含有Jhereg的記錄
記住:必定要在多個條件中間加空格進行分隔**
也可使用*通配符
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=%2BCanon*%20%2Bscanner*&wt=json
意思爲查詢某個字段值包含book,且某個字段值還包含scanner的記錄
限定字段組合查詢
組合方式的語法都同樣,聚個例子綜合說明一下
http://192.168.245.128:8983/solr/gettingstarted/select?indent=on&q=%2Bname:Canon*%20%2Bcat:scanner*&wt=json
意思爲查詢字段name的值包含Cannon,且字段cat的值包含scanner的記錄
bin/solr delete -c techproducts
## 4.1 手動建立
./solr create -c films -s 2 -rf 2
利用默認配置文件建立了一個鏈接爲films,它包含兩個shard,兩個replica
在實際生產中不推薦使用默認配置文件,由於它有必定的侷限性
默認配置文件是schemaless(無模式)的,它會根據本身的一套邏輯猜想文檔中的字段可能的類型,可是solr可能猜錯,若是數據小,那麼還能夠從新創建索引文件,若是數據量巨大,且以前的原始數據已經找不到,那麼將會形成很嚴重的後果
solr提供了api修改默認schema配置文件,該配置文件不要手工去修改,以避免形成不可恢復的故障。
### 1)添加自定義字段
在films的默認配置中添加自定義name字段:
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' http://localhost:8983/solr/films/schema
還能夠在瀏覽器控制檯中添加:
### 2)添加一個複製字段:
複製字段是給其餘未設置的字段使用的規則,假若有一個未定義的字段gendor,那麼solr在處理的時候就將使用複製字段的定義來操做數據
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' http://localhost:8983/solr/films/schema
還能夠在瀏覽器控制檯中添加:
## 4.2 導入數據
命令導入各類數據文件:
./post -c films ../example/films/films.json ./post -c films ../example/films/films.xml ./post -c films ../example/films/films.csv -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
也能夠界面導入各類數據文件:
### 4.3.1 Field-Facets
不知道怎麼翻譯這個詞,感受該功能頗有用,有點相似數據庫的group by + count功能
好比要統計字段genre的值在全部文檔的genre字段值中出現的次數
命令:
http://192.168.245.128:8983/solr/films/select?q=*:*&rows=0&facet=true&facet.field=genre&wt=json
意思爲:查詢全部文檔中genre字段中全部值分別出現的次數
還可使用facet.prefix參數過濾返回結果,好比:
http://192.168.245.128:8983/solr/films/select?q=*:*&rows=0&facet=true&facet.field=genre&facet.prefix=D&wt=json
圖形界面:
rows必定要設置爲0,纔會顯示出統計結果
前面提到Faceting功能相似於Oracle的group by +count功能,若是在上面的查詢參數中加上facet.mincount參數,則就相似於Oracle的group by + count + having count > xxx
http://192.168.245.128:8983/solr/films/select?q=*:*&rows=0&facet=true&facet.field=genre&facet.prefix=D&facet.mincount=50&wt=json
這條語句的意思就是:只返回count>50的結果
按範圍進行分組計算,計算支持數字和日期兩種數據類型
日期
時間間隔支持YEAR MONTH DAY HOUR SECONDS MINUTES
命令:(瀏覽器控制檯不支持該種查詢):
http://192.168.245.128:8983/solr/films/select?q=*:*&rows=0&facet=true&facet.range=initial_release_date&facet.range.start=NOW-20YEAR&facet.range.end=NOW&facet.range.gap=%2B5YEAR&wt=json
數字
舉例,命令:
http://192.168.245.128:8983/solr/gettingstarted/select?q=*:*&rows=0&facet=true&facet.range=price&facet.range.start=0&facet.range.end=1000&facet.range.gap=%2B100&wt=json
@Test public void test1() throws Exception { final String solrUrl = "http://192.168.245.128:8983/solr"; HttpSolrClient client = new HttpSolrClient(solrUrl); client.setConnectionTimeout(10000); client.setSoTimeout(60000); final Map<String, String> queryParamMap = new HashMap<String, String>(); queryParamMap.put("q", "*:*"); queryParamMap.put("fl", "id, name"); queryParamMap.put("sort", "id asc"); queryParamMap.put("rows", "100"); MapSolrParams queryParams = new MapSolrParams(queryParamMap); final QueryResponse response = client.query("gettingstarted", queryParams); final SolrDocumentList documents = response.getResults(); System.out.println("Found " + documents.getNumFound() + " documents"); for(SolrDocument document : documents) { final String id = (String) document.getFirstValue("id"); final String name = (String) document.getFirstValue("name"); System.out.println("id: " + id + "; name: " + name); } }
@Test public void test2() throws Exception { final String solrUrl = "http://192.168.245.128:8983/solr"; HttpSolrClient client = new HttpSolrClient(solrUrl); final SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", UUID.randomUUID().toString()); doc.addField("myname", "梅西又當爹了,兒子還叫C羅"); final UpdateResponse updateResponse = client.add("myConn", doc); //NamedList<?> ns = updateResponse.getResponse(); // Indexed documents must be committed client.commit("myConn"); }
由於使用的是solrcould,你在主機上找不到對應的manage-schema.xml文件(若是是非雲,那麼是schema.xml),須要利用命令從zookeeper中將配置文件down下來
導出配置文件
./solr zk downconfig -z 192.168.245.128:9983 -n myConn -d /home/jionsvolk/proc/solr-6.3.0/conf
--myConn是你本身core或者connection
編輯配置文件
將以下內容添加到manage-schema中
<fieldtype name="textComplex" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="dic"/> </analyzer> </fieldtype> <fieldtype name="textMaxWord" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" /> </analyzer> </fieldtype> <fieldtype name="textSimple" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple"/> </analyzer> </fieldtype> <!-- myname是document對象中的字段名,表示doc中myname使用textComplex字段類型,該類型使用mmseg4j中文分詞器,indexed=true 表示須要創建索引 stored=true 表示須要存儲到索引文件中 --> <field name="myname" type="textComplex" indexed="true" stored="true"></field>
上傳配置文件
./solr zk upconfig -z 192.168.245.128:9983 -n myConn -d /home/jionsvolk/proc/solr-6.3.0/conf/conf
測試代碼:
@Test public void test3() throws Exception { final String solrUrl = "http://192.168.245.128:8983/solr/myConn"; HttpSolrClient client = new HttpSolrClient(solrUrl); FieldAnalysisRequest request = new FieldAnalysisRequest(); request.addFieldName("myname");// 字段名,隨便指定一個支持中文分詞的字段 request.setFieldValue("");// 字段值,能夠爲空字符串,可是須要顯式指定此參數 request.setQuery("字段名,隨便指定一個支持中文分詞的字段"); FieldAnalysisResponse response = request.process(client); Iterator<AnalysisPhase> it = response.getFieldNameAnalysis("myname").getQueryPhases().iterator(); while(it.hasNext()) { AnalysisPhase pharse = (AnalysisPhase)it.next(); List<TokenInfo> list = pharse.getTokens(); for (TokenInfo info : list) { System.out.println(info.getText()); } } }
測試截圖
若是本身的文檔字段不少,且都須要索引,能夠定義一個動態字段進行匹配,這樣就不用再每一個字段都在manage-schema中進行定義。
如:
<dynamicField name="*_mytxt" type="textComplex" indexed="true" stored="true"/>
只要你在構造對象的時候,屬性名字是以_mytxt結尾,就會按照「textComplex」進行處理