PHP使用elasticsearch搜索安裝及分詞方法

時間 2019-11-07

標籤 php 使用 elasticsearch 搜索安裝分詞方法欄目 PHP 简体版

原文原文鏈接

1、背景

爲何會用到這個ES搜索？
是由於我在看烏雲的漏洞案例庫時候，搜索即爲不方便。php

好比說說我要搜索一個 SQL注入node

那mysql匹配的時候是like模糊匹配，搜索必需要有SQL注入這四個字，連續的才能查找到那這樣會不太方便。mysql

而後我就想着作一個分詞，搜索起來會方便很多，第一個想到的就是ES搜索了。git

怎麼去用ES呢？github

2、安裝ES搜索

咱們只須要一個JAVA環境而且把Java的環境變量配置好，我相信這些JAVA環境你們以前都配置過，這裏很少說。sql

那如今只須要下載ES的文件，也不須要編譯，下載下來就好了，把他放到一個目錄。
下載地址： https://www.elastic.co/downlo...數據庫

3、安裝head

head是基於node開發的，因此須要先安裝node
node下載地址：http://cdn.npm.taobao.org/dis...npm

在電腦任意一個目錄下（不要在elasticsearch目錄裏面），執行一下命令，json

git clone https://github.com/mobz/elasticsearch-head.git  
cd elasticsearch-head/  
npm install

三、修改部分配置
修改兩個地方：
文件：elasticsearch-headGruntfile.jscors

connect: {
    server: {
        options: {
            port: 9100,
            hostname: '*',
            base: '.',
            keepalive: true
        }
    }
}

增長配置，文件：elasticsearch-5.6.0configelasticsearch.yml

http.cors.enabled: true  
http.cors.allow-origin: "*"

四、輸入 npm run start 啓動

五、訪問head管理頁面: http://localhost:9100/

4、安裝composer

就是咱們須要安裝composer，安裝composer幹什麼呢？

下載地址：https://getcomposer.org/Compo...

下載以後，直接下一步下一步就安裝好了。

由於咱們PHP若是調用ES搜索的接口，咱們是須要去下載一個類庫。

1. 在當前目錄下載composer.phar

curl -sS https://getcomposer.org/installer | php

2. 在當前目錄下建立一個composer.json的文件

{
    "require": {
        "elasticsearch/elasticsearch": "~2.0@beta"
    }
}

3. 安裝依賴

php composer.phar install

5、安裝分詞插件

就是說咱們須要安裝一個分詞插件。
在ES搜索當中Ik分詞插件是中文分詞最好用的一個，安裝也極爲方便。

咱們只須要到GitHub上把他對應版本的這個，文件下載下來，而後解壓到ES的插件目錄，而後從新啓動一下ES搜索服務，就能夠了。

下載地址：https://github.com/medcl/elas...

怎麼去驗證這個插件有沒有安裝成功呢？
咱們能夠經過下面的URL，作分詞測試。
http://localhost:9200/你的庫名/_analyze?analyzer=ik_max_word&pretty=true&text=中華人民共和國

咱們能夠在這個URL中輸入，中華人民共和國；默認的分詞器他會把中華人民共和國分別以以中、華、人、民、共、和、國。

那咱們選擇用IK做爲分詞器後，它是能夠把中華人民共和國做爲一個詞，把中華做爲一個詞。

6、導入數據

如今說一下怎麼把數據庫中的數據導入到ES中，

首先須要創建這樣一個庫，
而後把數據按照固定的格式插入到ES搜索中。下面是個人一個代碼示例

<?php
require_once './vendor/autoload.php';

//鏈接MYSQL數據庫
function get_conn()
{
    @$conn = mysql_connect("localhost", "root", "") or die("error connecting");
    mysql_select_db("wooyun", $conn);
    mysql_query("SET NAMES 'UTF8'");
    return $conn;
}

//插入數據到ES搜索中
function create_index($maxId, $client)
{
    //查詢數據庫中的數據
    $sql = "SELECT * FROM bugs where id > $maxId limit 0,300";
    get_conn();
    @$result_bugs = mysql_query($sql);
    while (@$row = mysql_fetch_assoc(@$result_bugs)) {
        $rtn[] = $row;
    }

    foreach ($rtn as $val) {
        $params = array();
        $params['body'] = array(
            'id' => $val['id'],
            'wybug_id' => $val['wybug_id'],
            'wybug_title' => $val['wybug_title'],
        );
        $params['index'] = 'wooyun';
        $params['type'] = 'title';
        $client->index($params);
    }

    return (count($rtn) == 300) ? $val['id'] : false;
}

set_time_limit(0);
$client = Elasticsearch\ClientBuilder::create()->setHosts(['localhost'])->build();
//刪除全部數據
$client->indices()->delete(['index' => 'wooyun']);

$a = true;
$maxId = 0;
while ($a) {
    $maxId = create_index($maxId, $client);
    if (empty($maxId)) {
        $a = false;
    }
}

7、查詢數據

<?php
//引入mysql鏈接，和ES類庫
require('conn.php');
require_once 'vendor/autoload.php';
function search($keyword, $page = 0, $size = 20)
{
    //對象實例化
    $client = Elasticsearch\ClientBuilder::create()->setHosts(['localhost'])->build();
    //查詢數據的拼裝
    $params = array();
    $params['index'] = 'wooyun';
    $params['type'] = 'title';
    $params['body']['query']['match']['wybug_title'] = $keyword;
    $params['from'] = $page;
    $params['size'] = $size;
    //執行查詢
    $rtn = $client->search($params)['hits'];
    //結果組裝組裝數據
    $data['total'] = $rtn['total'];
    $data['lists'] = array_column($rtn['hits'], '_source');
    $data['lists'] = formartData(array_column($data['lists'], 'id'));

    return $data;
}

function formartData($ids)
{
    $ids = implode($ids, ',');
    $sql = "select * from bugs where id in($ids)";
    $data = mysql_query($sql);

    $rtn = [];
    while (@$row = mysql_fetch_assoc(@$data)) {
        $rtn[] = $row;
    }

    return $rtn;
}

$q0 = isset($_GET['q']) ? $_GET['q'] : 'SQL注入';
$num = "15"; //每頁顯示15條
$page = isset($_GET['page']) ? intval($_GET['page']) : 1;
$offset = ($page - 1) * $num;
$esData = search($q0, $offset, $num);