pgsql查詢優化之模糊查詢

前言

      一直以來,對於搜索時模糊匹配的優化一直是個讓人頭疼的問題,好在強大pgsql提供了優化方案,下面就來簡單談一談如何經過索引來優化模糊匹配    html

案例

      咱們有一張千萬級數據的檢查報告表,須要經過檢查報告來模糊搜索某個條件,咱們先建立以下索引:正則表達式

CREATE INDEX lab_report_report_name_index ON lab_report USING btree (report_name);

      而後搜個簡單的模糊匹配條件如 LIKE "血常規%",能夠發現查詢計劃生成以下,索引並無被使用上,這是由於傳統的btree索引並不支持模糊匹配sql

      查閱文檔後發現,pgsql能夠在Btree索引上指定操做符:text_pattern_ops、varchar_pattern_ops和 bpchar_pattern_ops,它們分別對應字段類型text、varchar和 char,官方解釋爲「它們與默認操做符類的區別是值的比較是嚴格按照字符進行而不是根據區域相關的排序規則。這使得這些操做符類適合於當一個數據庫沒有使用標準「C」區域時被使用在涉及模式匹配表達式(LIKE或POSIX正則表達式)的查詢中。」, 有些抽象,咱們先試試看。建立以下索引並查詢剛纔的條件 LIKE"血常規%":(參考pgsql的文檔 https://www.postgresql.org/docs/10/indexes-opclass.html數據庫

CREATE INDEX lab_report_report_name_index ON lab.lab_report (report_name varchar_pattern_ops);

      發現確實能夠走索引掃描 ,執行時間也從213ms優化到125ms,可是,若是搜索LIKE "%血常規%"就又會走全表掃描了!    這裏咱們引入本篇博客的主角"pg_trgm"和"pg_bigm"。ide

      建立這兩個索引前分別須要引入以下兩個擴展包 :post

CREATE EXTENSION pg_trgm;
CREATE EXTENSION pg_bigm;

 這兩個索引的區別是:「pg_tigm」爲pgsql官方提供的索引,"pg_tigm"爲日本開發者提供。下面是詳細的對比:(參考pg_bigm的文檔 http://pgbigm.osdn.jp/pg_bigm_en-1-2.html測試

Comparison with pg_trgm

The pg_trgm contrib module which provides full text search capability using 3-gram (trigram) model is included in PostgreSQL. The pg_bigm was developed based on the pg_trgm. They have the following differences:優化

Functionalities and Features pg_trgm pg_bigm
Phrase matching method for full text search 3-gram 2-gram
Available index GIN and GiST GIN only
Available text search operators LIKE (~~), ILIKE (~~*), ~, ~* LIKE only
Full text search for non-alphabetic language
(e.g., Japanese)
Not supported (*1) Supported
Full text search with 1-2 characters keyword Slow (*2) Fast
Similarity search Supported Supported (version 1.1 or later)
Maximum indexed column size 238,609,291 Bytes (~228MB) 107,374,180 Bytes (~102MB)
  • (*1) You can use full text search for non-alphabetic language by commenting out KEEPONLYALNUM macro variable in contrib/pg_trgm/pg_trgm.h and rebuilding pg_trgm module. But pg_bigm provides faster non-alphabetic search than such a modified pg_trgm.
  • (*2) Because, in this search, only sequential scan or index full scan (not normal index scan) can run.

pg_bigm 1.1 or later can coexist with pg_trgm in the same database, but pg_bigm 1.0 cannot.ui

   如無特殊要求推薦使用"pg_bigm",咱們測試一下效果:this

CREATE INDEX lab_report_report_name_index ON lab_report USING gin (report_name public.gin_bigm_ops);

 

能夠使用位圖索引掃描,對於本次案例,使用pg_trgm效果同pg_bigm。

以上

本文只是簡單的介紹許多細節並未作深刻的分析,歡迎留言指教或者討論

相關文章
相關標籤/搜索