laravel5下全文搜索和中文分詞:TNTSearch+jieba-php

時間 2019-11-18

標籤 laravel5 laravel 全文搜索中文分詞 tntsearch+jieba tntsearch jieba php 欄目搜索引擎简体版

原文原文鏈接

這套組合能夠在不依賴第三方的狀況下實現中文全文搜索，項目演示；php

laravel new tntsearch

Bashlaravel

建立一個文章表和文章模型；git

php artisan make:model Models/Article -m

Bashgithub

新建數據庫，數據表（略）；web

修改 .env 數據庫配置項；sql

DB_DATABASE=homestead
DB_USERNAME=homestead
DB_PASSWORD=secret

PHP數據庫

生成測試數據;數組

注意：**必定要是用模型方法，不然會致使插入的內容不被會搜索，由於沒有更新索引（**這很重要！！！），以此，下面的方法不行：app

public function run()
{
    DB::table('articles')->insert([
        [
            'title' => 'TNTSearch',
            'content' => '一個用PHP編寫的功能齊全的全文搜索引擎'
        ],
        [
            'title' => 'jieba-php',
            'content' => '"結巴"中文分詞:作最好的php中文分詞、中文斷詞組件'
        ]
    ]);
}

PHPcomposer

改用如下方法填充數據：

public function add_data()
{
    $article_m = new Article();
    $article_m ->title = 'TNTSearch';
    $article_m ->content = '一個用PHP編寫的功能齊全的全文搜索網站';
    $article_m ->save();
}

PHP

同理：修改數據也須要用到模型方法：

public function update_data()
{
    $article = Article::find(1);
$article ->title = "jieba-php";
    $article ->content = ""結巴"中文分詞:作最好的php中文分詞、中文斷詞組件。";
    $article ->save();
}

PHP

同理：刪除數據也須要用到索引方法：

public function delate_data()
{
    $article = Article::find(1);
    $article ->delete();
}

PHP

/routes/web.php

<?php
use App\Models\Article;

Route::get('search', function () {
    // 爲查看方便都轉成數組
    dump(Article::all()->toArray());
});

PHP

準備工做終於作完了；另外由於依賴 SQLite 存儲索引；再確認下本身的 php 開啓瞭如下擴展；

pdo_sqlite
sqlite3
mbstring

Bash

如今開始正題；

之前；咱們須要本身 require scout； scout 是 laravel 官方提供的用於全文搜索的擴展包；它爲咱們提供了方便的命令行；並且當咱們增刪改查文章後它會自動同步索引；而後 require tntsearch 爲 scout 提供的 laravel-scout-tntsearch-driver ; 再而後編寫使用中文分詞的邏輯；如今有了 vanry 爲咱們造的輪子 laravel-scout-tntsearch ; 之前到如今這中間的步驟就能夠省略了；直接 require laravel-scout-tntsearch-driver ;

composer require vanry/laravel-scout-tntsearch

Bash

添加 Provider ； config/app.php

'providers' => [

    // ...

    /**
     * TNTSearch 全文搜索
     */
    Laravel\Scout\ScoutServiceProvider::class,
    Vanry\Scout\TNTSearchScoutServiceProvider::class,
],

PHP

中文分詞 require jieba-php

composer require fukuball/jieba-php

Bash

發佈配置項;

php artisan vendor:publish --provider="Laravel\Scout\ScoutServiceProvider"

Bash

配置項中增長 tntsearch ； /config/scout.php ；

'tntsearch' => [
    'storage' => storage_path('indexes'), //必須有可寫權限
    'fuzziness' => env('TNTSEARCH_FUZZINESS', false),
    'searchBoolean' => env('TNTSEARCH_BOOLEAN', false),
    'asYouType' => false,

    'fuzzy' => [
        'prefix_length' => 2,
        'max_expansions' => 50,
        'distance' => 2,
    ],

    'tokenizer' => [
        'driver' => env('TNTSEARCH_TOKENIZER', 'default'),

        'jieba' => [
            'dict' => 'small',
            //'user_dict' => resource_path('dicts/mydict.txt'), //自定義詞典路徑
        ],

        'analysis' => [
            'result_type' => 2,
            'unit_word' => true,
            'differ_max' => true,
        ],

        'scws' => [
            'charset' => 'utf-8',
            'dict' => '/usr/local/scws/etc/dict.utf8.xdb',
            'rule' => '/usr/local/scws/etc/rules.utf8.ini',
            'multi' => 1,
            'ignore' => true,
            'duality' => false,
        ],
    ],

    'stopwords' => [
        '的',
        '了',
        '而是',
    ],
],

PHP

增長配置項； /.env ;

SCOUT_DRIVER=tntsearch
TNTSEARCH_TOKENIZER=jieba

Bash

模型中定義全文搜索； /app/Models/Article.php

<?php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;
use Laravel\Scout\Searchable;

class Article extends Model
{
    use Searchable;

    /**
     * 索引的字段
     *
     * @return array
     */
    public function toSearchableArray()
    {
        return $this->only('id', 'title', 'content');
    }
}

PHP

php 默認的 memory_limit 是 128M；爲了防止 PHP Fatal error: Allowed memory size of n bytes exhausted；咱給增長到 256M 以解決內存不夠報錯的問題; /app/Providers/AppServiceProvider.php

public function boot()
{
    /**
     * 增長內存防止中文分詞報錯
     */
    ini_set('memory_limit', "256M");
}

PHP

生成索引；

php artisan scout:import "App\Models\Article"

Bash

使用起來也至關簡單；只須要把要搜索的內容傳給 search() 方法便可; /routes/web.php

<?php
use App\Models\Article;

Route::get('search', function () {
    // 爲查看方便都轉成數組
    dump(Article::all()->toArray());
    dump(Article::search('功能齊全的搜索引擎')->get()->toArray());
});

PHP

成功的查出了數據；最後咱們再測下修改數據、刪除數據後的同步索引(上文有提到)；

參考連接： 1.https://baijunyao.com/article/154 2.https://learnku.com/docs/laravel/5.7/scout/2309