全文索引&&地理空間索引

Ⅰ、全文索引

  • 搜索引擎的實現核心技術,搜索相似where col like '%xxx%';關鍵字能夠出現再某個列任何位置
  • 這種查詢條件,B+ tree索引是沒法使用的。若是col上建立了索引,由於排序過了,因此能用到索引,可是對其中某個關鍵字是沒法排序的
  • 首先須要經過分詞進行各詞的提取,把各個分詞再保存到各個B+ tree索引中
  • 支持在varchar,char,text等類型上建立全文索引
  • MySQL5.6版本以前僅MyISAM支持全文索引
  • MySQL5.6版本InnoDB引擎支持全文索引
  • 上面的支持只支持英文的全文索引
  • MySQL5.7版本支持中文、日文、韓文的全文索引(真正生產環境可用)
  • 目前一張表只能有一個全文索引
  • 添加全文索引時表是隻讀的,不可寫入與更新,即不支持online-ddl,這種問題就要用pt了

tips:
以前全文索引不是用MySQL來作的,用lucence作node

在title,body列上建立全文索引mysql

alter table xxx add fulltext index idx_xxx (title,body);

全文索引SQL查詢,不能使用like進行,須要使用全文索引的語法redis

1.1 檢索方式:

  • ①天然語言檢索
mysql> SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

查看相關性sql

mysql> SELECT id, body, MATCH (title,body) AGAINST
    ('Security implications of running MySQL as root'
    IN NATURAL LANGUAGE MODE) AS score
    FROM articles WHERE MATCH (title,body) AGAINST
    ('Security implications of running MySQL as root'
    IN NATURAL LANGUAGE MODE);
+----+-------------------------------------+-----------------+
| id | body                                | score           |
+----+-------------------------------------+-----------------+
|  4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
|  6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)
  • ②布爾檢索(必定要有,- 不能夠有)
mysql> SELECT * FROM articles WHERE MATCH (title,body)
    AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
+----+-----------------------+-------------------------------------+
| id | title                 | body                                |
+----+-----------------------+-------------------------------------+
|  1 | MySQL Tutorial        | DBMS stands for DataBase ...        |
|  2 | How To Use MySQL Well | After you went through a ...        |
|  3 | Optimizing MySQL      | In this tutorial we will show ...   |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ... |
|  6 | MySQL Security        | When configured properly, MySQL ... |
+----+-----------------------+-------------------------------------+
  • ③查詢擴展檢索

一般不要使用WITH QUERY EXPANSION ,是一個兩次搜索的過程,第二次搜索的搜索短語是與第一次搜索中的幾個最高相關性的原始搜索短語mongodb

mysql> SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' WITH QUERY EXPANSION);
+----+-----------------------+------------------------------------------+
| id | title                 | body                                     |
+----+-----------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL     | In the following database comparison ... |
|  1 | MySQL Tutorial        | DBMS stands for DataBase ...             |
|  3 | Optimizing MySQL      | In this tutorial we will show ...        |
|  6 | MySQL Security        | When configured properly, MySQL ...      |
|  2 | How To Use MySQL Well | After you went through a ...             |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ...      |
+----+-----------------------+------------------------------------------+
6 rows in set (0.00 sec)

最多見的場景:column => tag => mysql,database,it,oracle 表結構設計有問題,應該設計爲一對多,一個文章對應多個tag,有張專門tag表用來反過來查oracle

相關參數:ide

ft_min_word_len = 全文檢索的最小許可字符,默認4,中文一般兩個字一個詞語,設置爲2比較好

tips:
若是可能,請儘可能先建立表並插入全部數據後再建立全文索引,而不要在建立表時就直接建立全文索引,由於前者比後者的全文索引效率要高函數

Ⅱ、地理空間索引

  • MySQL5.7版本以前僅MyISAM引發支持地理空間索引
  • MySQL5.7版本InnoDB引擎支持地理空間索引
  • 以前都用mongodb來作地理空間索引
  • 追求性能的話用redis
CREATE TABLE nodes (
    id BIGINT(20) DEFAULT NULL,
    geom GEOMETRY NOT NULL,
    user VARCHAR(50) DEFAULT NULL,
    version INT(11) DEFAULT NULL,
    timestamp VARCHAR(20) DEFAULT NULL,
    UNIQUE KEY i_nodeides (id),
    SPATIAL KEY i_geomidx ( geom )
)  ENGINE=INNODB DEFAULT CHARSET=LATIN1;

用來存經緯度性能

alter table nodes add column tags text, add fulltext index(tags);

UPDATE nodes
SET
    tags = (SELECT
            GROUP_CONCAT(CONCAT(k, v)
                    SEPARATOR ',')
        FROM
            nodetags
        WHERE
            nodetags.id = nodes.id
        GROUP BY nodes.id);

SELECT
    id,
    ST_DISTANCE_SPHERE(POINT(- 73.951368, 40.716743), geom) AS distance_in_meters,
    tags,
    ST_ASTEXT(geom)
FROM
    nodes
WHERE
    ST_CONTAINS(ST_MAKEENVELOPE(POINT((- 73.951368 + (20 / lll)),
                        (40.716743 + (20 / lll))),
                    POINT((- 73.951368 - (20 / 111)),
                        (40.716743 - (20 / lll)))),
            geom)
        AND MATCH (tags) AGAINST ('+thai +restaurant' IN BOOLEAN MODE)
ORDER BY distance_in_meters
LIMIT 10;
查詢附近二十千米的泰國餐館

tips:
地理空間索引經測試性能特別差測試

真的要這種地理空間服務功能要用5.7的GeoHash,配合函數索引

alter table nodes
add column geohash varchar(128)
as (st_geohash(geom,6)) virtual

alter table nodes add index i_geohash_idx(geohash)

性能提高12倍,基本上用mongodb來存,最好就用redis,MySQL用的很少,快遞行業的軌跡可能用獲得,量也不大,只是保存一下,作最後的持久化

相關文章
相關標籤/搜索