yii+sphinx 配置、管理

1.sphinx安裝php

  http://sphinxsearch.com/docs/latest/installing-debian.htmlhtml

  

There are two ways of getting Sphinx for Ubuntu: regular deb packages and the Launchpad PPA repository.mysql

Deb packages:nginx

  1. Sphinx requires a few libraries to be installed on Debian/Ubuntu. Use apt-get to download and install these dependencies:sql

    $ sudo apt-get install mysql-client unixodbc libpq5
  2. Now you can install Sphinx:shell

    $ sudo dpkg -i sphinxsearch_2.2.11-dev-0ubuntu12~trusty_amd64.deb

PPA repository (Ubuntu only).數據庫

Installing Sphinx is much easier from Sphinxsearch PPA repository, because you will get all dependencies and can also update Sphinx to the latest version with the same command.json

  1. First, add Sphinxsearch repository and update the list of packages:ubuntu

    $ sudo add-apt-repository ppa:builds/sphinxsearch-rel22api

    $ sudo apt-get update

  2. Install/update sphinxsearch package:

    $ sudo apt-get install sphinxsearch

Sphinx searchd daemon can be started/stopped using service command:

$ sudo service sphinxsearch start

 

2.sphinx配置

  https://yq.aliyun.com/articles/66520?spm=5176.100240.searchblog.88.ovXgTm

 

1、sphinx的配置

1.   sphinx配置文件結構介紹

Sphinx的配置文件結構以下:

 

Source 源名稱1{     
#添加數據源,這裏會設置一些鏈接數據庫的參數好比數據庫的IP、用戶名、密碼等
#設置sql_query、設置sql_query_pre、設置sql_query_range等後面會結合例子作詳細介紹
 ……
}
Index 索引名稱1{
     Source=源名稱1
#設置全文索引
     ……
}
Indexer{
#設置Indexer程序配置選項,如內存限制等
……
}
Searchd{  
#設置Searchd守護進程自己的一些參數
……
}

Source和Index均可以配置多個。

 

 

2.   spinx配置案例詳細解釋

接下來就來針對一個配置案例來作詳細的配置介紹:

#定義一個數據源
source search_main
{
           #定義數據庫類型
    type                 = mysql
           #定義數據庫的IP或者計算機名
    sql_host             = localhost
           #定義鏈接數據庫的賬號
    sql_user             = root
           #定義連接數據庫的密碼
    sql_pass             = test123
           #定義數據庫名稱
    sql_db               = test
           #定義鏈接數據庫後取數據以前執行的SQL語句
    sql_query_pre        = SET NAMES utf8
    sql_query_pre        = SET SESSION query_cache_type=OFF
           #建立一個sph_counter用於增量索引
    sql_query_pre        = CREATE TABLE IF NOT EXISTS sph_counter \
                                      ( counter_id INTEGER PRIMARY KEY NOT NULL,max_doc_id INTEGER NOT NULL)
           #取數據以前將表的最大id記錄到sph_counter表中
    sql_query_pre        = REPLACE INTO sph_counter SELECT 1, MAX(searchid) FROM v9_search
           #定義取數據的SQL,第一列ID列必須爲惟一的正整數值
    sql_query            = SELECT searchid,typeid,id,adddate,data FROM v9_search where \
                                      searchid<( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) \
                                        and searchid>=$start AND searchid<=$end
           # sql_attr_uint和sql_attr_timestamp用於定義用於api過濾或者排序,寫多行制定多列
    sql_attr_uint        = typeid
    sql_attr_uint        = id
    sql_attr_timestamp   = adddate
           #分區查詢設置
    sql_query_range      = SELECT MIN(searchid),MAX(searchid) FROM v9_search
           #分區查詢的步長
    sql_range_step       = 1000
           #設置分區查詢的時間間隔
    sql_ranged_throttle  = 0
           #用於CLI的調試
    sql_query_info       = SELECT * FROM v9_search WHERE searchid=$id
}
#定義一個增量的源
source search_main_delta : search_main
{
    sql_query_pre       = set names utf8
           #增量源只查詢上次主索引生成後新增長的數據
#若是新增長的searchid比主索引創建時的searchid還小那麼會漏掉
    sql_query           = SELECT searchid,typeid,id,adddate,data FROM v9_search where  \
                                  searchid>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) \
                                   and searchid>=$start AND searchid<=$end
    sql_query_range     = SELECT MIN(searchid),MAX(searchid) FROM v9_search where \
                                       searchid>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 )
}
 
#定義一個index_search_main索引
index index_search_main
{
           #設置索引的源
    source            = search_main
           #設置生成的索引存放路徑
    path         = /usr/local/coreseek/var/data/index_search_main
           #定義文檔信息的存儲模式,extern表示文檔信息和文檔id分開存儲
    docinfo           = extern
           #設置已緩存數據的內存鎖定,爲0表示不鎖定
    mlock             = 0
           #設置詞形處理器列表,設置爲none表示不使用任何詞形處理器
    morphology        = none
           #定義最小索引詞的長度
    min_word_len      = 1
           #設置字符集編碼類型,我這裏採用的utf8編碼和數據庫的一致
    charset_type      = zh_cn.utf-8
           #指定分詞讀取詞典文件的位置
    charset_dictpath  = /usr/local/mmseg3/etc
           #不被搜索的詞文件裏表。
    stopwords       = /usr/local/coreseek/var/data/stopwords.txt
           #定義是否從輸入全文數據中取出HTML標記
    html_strip       = 0
}
#定義增量索引
index index_search_main_delta : index_search_main
{
    source   = search_main_delta
    path    = /usr/local/coreseek/var/data/index_search_main_delta
}
 
#定義indexer配置選項
indexer
{
           #定義生成索引過程使用索引的限制
    mem_limit        = 512M
}
 
#定義searchd守護進程的相關選項
searchd
{
           #定義監聽的IP和端口
    #listen            = 127.0.0.1
    #listen            = 172.16.88.100:3312
    listen            = 3312
    listen            = /var/run/searchd.sock
           #定義log的位置
    log                = /usr/local/coreseek/var/log/searchd.log
           #定義查詢log的位置
    query_log          = /usr/local/coreseek/var/log/query.log
           #定義網絡客戶端請求的讀超時時間
    read_timeout       = 5
           #定義子進程的最大數量
    max_children       = 300
           #設置searchd進程pid文件名
    pid_file           = /usr/local/coreseek/var/log/searchd.pid
           #定義守護進程在內存中爲每一個索引所保持並返回給客戶端的匹配數目的最大值
    max_matches        = 100000
           #啓用無縫seamless輪轉,防止searchd輪轉在須要預取大量數據的索引時中止響應
    #也就是說在任什麼時候刻查詢均可用,或者使用舊索引,或者使用新索引
    seamless_rotate    = 1
           #配置在啓動時強制從新打開全部索引文件
    preopen_indexes    = 1
           #設置索引輪轉成功之後刪除以.old爲擴展名的索引拷貝
    unlink_old         = 1
           # MVA更新池大小,這個參數不太明白
    mva_updates_pool   = 1M
           #最大容許的包大小
    max_packet_size    = 32M
           #最大容許的過濾器數
    max_filters        = 256
           #每一個過濾器最大容許的值的個數
    max_filter_values  = 4096
}

 

2、sphinx的管理

1.    生成Sphinx中文分詞詞庫(新版本的中文分詞庫已經生成在了/usr/local/mmseg3/etc目錄下)

 

    cd /usr/local/mmseg3/etc
/usr/local/mmseg3/bin/mmseg -u thesaurus.txt
mv thesaurus.txt.uni uni.lib 

2.   生成Sphinx中文同義詞庫

#同義詞庫是說好比你搜索深圳的時候,含有深圳灣等字的也會被搜索出來

/data/software/sphinx/coreseek-3.2.14/mmseg-3.2.14/script/build_thesaurus.py unigram.txt > thesaurus.txt
/usr/local/mmseg3/bin/mmseg -t thesaurus.txt
將thesaurus.lib放到uni.lib同一目錄


3.    生成所有索引

 

/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf –all

若此時searchd守護進程已經啓動,那麼須要加上—rotate參數:

/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --all --rotate

4.    啓動searchd守護進程

/usr/local/coreseek/bin/searchd --config /usr/local/coreseek/etc/sphinx.conf

 

5.   生成主索引

寫成shell腳本,添加到crontab任務,設置成天天凌晨1點的時候重建主索引

/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --rotate index_search_main

6.     生成增量索引

寫成shell腳本,添加到crontab任務,設置成每10分鐘運行一次

/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --rotate index_search_main_delta

7.    增量索引和主索引的合併

寫成shell腳本,添加到計劃任務,每15分鐘跑一次

/usr/local/coreseek/bin/indexer --config /usr/local/coreseek/etc/sphinx.conf --merge index_search_main index_search_main_delta --rotate

8.    使用search命令在命令行對索引進行檢索

 

/usr/local/coreseek/bin/search --config /usr/local/coreseek/etc/sphinx.conf  遊戲



3.yii
http://www.yiiframework.com/doc-2.0/ext-sphinx-index.html

Installation

The preferred way to install this extension is through composer.

Either run

php composer.phar require --prefer-dist yiisoft/yii2-sphinx 

or add

"yiisoft/yii2-sphinx": "~2.0.0" 

to the require section of your composer.json.

Configuration

This extension interacts with Sphinx search daemon using MySQL protocol and SphinxQL query language. In order to setup Sphinx "searchd" to support MySQL protocol following configuration should be added:

searchd { listen = localhost:9306:mysql41 ... } 

To use this extension, simply add the following code in your application configuration:

return [ //.... 'components' => [ 'sphinx' => [ 'class' => 'yii\sphinx\Connection', 'dsn' => 'mysql:host=127.0.0.1;port=9306;dbname=xxx',【這裏127.0.0.1換成localhost】 'username' => 'root', 'password' => 'root', ], ], ];


$sql = 'SELECT * FROM documents WHERE content like :content';
$params = [
':content' => '%another%'
];
$rows = Yii::$app->sphinx->createCommand($sql, $params)->queryAll();


var_dump($rows);

 

 

 

另:http://www.xunsearch.com/       訊搜

相關文章
相關標籤/搜索