postgresql----Gist索引

時間 2019-11-09

原文原文鏈接

GiST的意思是通用的搜索樹(Generalized Search Tree)。它是一種平衡樹結構的訪問方法,在系統中做爲一個基本模版,可使用它實現任意索引模式。B-trees, R-trees和許多其它的索引模式均可以用GiST實現。數據庫

上面一段高能的官方解釋有點難以理解，暫時也不須要使用Gist實現其餘的索引模式，就簡單的介紹下Gist索引如何使用，oop

與Btree索引比較的優缺點：性能

優勢：測試

Gist索引適用於多維數據類型和集合數據類型，和Btree索引相似，一樣適用於其餘的數據類型。和Btree索引相比，Gist多字段索引在查詢條件中包含索引字段的任何子集都會使用索引掃描，而Btree索引只有查詢條件包含第一個索引字段纔會使用索引掃描。spa

缺點：code

Gist索引建立耗時較長，佔用空間也比較大。blog

測試表索引

test=# create table tbl_index(a bigint,b timestamp without time zone,c varchar(12));
CREATE TABLE
test=# insert into tbl_index (a,b,c)  select generate_series(1,3000000),clock_timestamp()::timestamp(0) without time zone,'got u';
INSERT 0 3000000

test=# \timing 
Timing is on.

建立Gist索引的前提是已經編譯並安裝了Gist的擴展，由於我源碼編譯時已經編譯安裝了全部的擴展，因此這裏只須要在數據庫中建立擴展便可。源碼

test=# create extension btree_gist;
CREATE EXTENSION
Time: 774.131 ms

建立索引it

test=# create index idx_gist_tbl_index_a_b on tbl_index using gist(a,b);
CREATE INDEX
Time: 168595.321 ms

示例1.使用字段a查詢

test=# explain analyze select * from tbl_index where a=3000000;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..21395.10 rows=1 width=22) (actual time=310.514..310.517 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbl_index  (cost=0.00..20395.00 rows=0 width=22) (actual time=289.432..289.433 rows=0 loops=3)
         Filter: (a = 3000000)
         Rows Removed by Filter: 1000000
 Planning time: 0.119 ms
 Execution time: 310.631 ms
(8 rows)

Time: 311.505 ms

test=# explain analyze select * from tbl_index where a='3000000';
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_gist_tbl_index_a_b on tbl_index  (cost=0.29..8.30 rows=1 width=22) (actual time=0.104..0.105 rows=1 loops=1)
   Index Cond: (a = '3000000'::bigint)
 Planning time: 0.109 ms
 Execution time: 0.297 ms
(4 rows)

Time: 1.124 ms

以上兩條SQL語句的區別在於第一條SQL語句按照a的類型bigint去查詢，而第二條SQL語句卻將bigint轉成char類型查詢，可是結果顯示char類型的查詢（索引掃描）性能遠高於bigint的查詢（全表掃描）性能，懷疑是否是建立索引時將bigint轉成char類型了（只是猜想），反正Gist索引查詢最好使用char。

示例2.使用字段b查詢

test=# explain analyze select * from tbl_index where b='2016-06-29 14:54:00';
                                                                  QUERY PLAN                                                         
         
-------------------------------------------------------------------------------------------------------------------------------------
---------
 Bitmap Heap Scan on tbl_index  (cost=3373.54..10281.04 rows=171000 width=22) (actual time=37.200..53.564 rows=172824 loops=1)
   Recheck Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone)
   Heap Blocks: exact=276
   ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..3330.79 rows=171000 width=0) (actual time=37.139..37.139 rows=172824 
loops=1)
         Index Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone)
 Planning time: 0.343 ms
 Execution time: 60.843 ms
(7 rows)

Time: 62.359 ms

該查詢不包含第一個索引字段，可是仍使用索引掃描，而此條件下Btree索引只能全表掃描。

示例3.使用a and b查詢

test=# explain analyze select * from tbl_index where a='3000000' and b='2016-06-29 14:54:00';
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_gist_tbl_index_a_b on tbl_index  (cost=0.29..8.31 rows=1 width=22) (actual time=0.114..0.115 rows=1 loops=1)
   Index Cond: ((a = '3000000'::bigint) AND (b = '2016-06-29 14:54:00'::timestamp without time zone))
 Planning time: 0.376 ms
 Execution time: 0.258 ms
(4 rows)

Time: 1.747 ms

示例4.使用a or b查詢

test=# explain analyze select * from tbl_index where a='3000000' or b='2016-06-29 14:54:00';
                                                                     QUERY PLAN                                                      
               
-------------------------------------------------------------------------------------------------------------------------------------
---------------
 Bitmap Heap Scan on tbl_index  (cost=3420.58..10755.60 rows=171001 width=22) (actual time=31.142..49.728 rows=172824 loops=1)
   Recheck Cond: ((a = '3000000'::bigint) OR (b = '2016-06-29 14:54:00'::timestamp without time zone))
   Heap Blocks: exact=276
   ->  BitmapOr  (cost=3420.58..3420.58 rows=171001 width=0) (actual time=31.083..31.083 rows=0 loops=1)
         ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..4.29 rows=1 width=0) (actual time=0.100..0.100 rows=1 loops=1)
               Index Cond: (a = '3000000'::bigint)
         ->  Bitmap Index Scan on idx_gist_tbl_index_a_b  (cost=0.00..3330.79 rows=171000 width=0) (actual time=30.981..30.981 rows=1
72824 loops=1)
               Index Cond: (b = '2016-06-29 14:54:00'::timestamp without time zone)
 Planning time: 0.143 ms
 Execution time: 57.193 ms
(10 rows)

Time: 58.067 ms

使用and和or查詢雖然也是索引掃描，可是和Btree索引相比並無性能提高。

比較Gist索引和Btree索引的建立耗時和大小

btree索引耗時：

test=# create index idx_btree_tbl_index_a_b on tbl_index using btree(a,b);
CREATE INDEX
Time: 5217.976 ms

Gist索引耗時從上面看到是168595.321 ms，是Btree索引耗時的32倍。

大小比較，結果顯示Gist索引是Btree索引的3倍多。

test=# select relname,pg_size_pretty(pg_relation_size(oid)) from pg_class where relname like 'idx_%_tbl_index_a_b';
         relname         | pg_size_pretty 
-------------------------+----------------
 idx_gist_tbl_index_a_b  | 281 MB
 idx_btree_tbl_index_a_b | 89 MB
(2 rows)

Time: 4.068 ms

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。