淺談PostgreSQL的索引

時間 2019-11-25

原文原文鏈接

1. 索引的特性

1.1 加快條件的檢索的特性

當表數據量愈來愈大時查詢速度會降低，在表的條件字段上使用索引，快速定位到可能知足條件的記錄，不須要遍歷全部記錄。html

create table t(id int, info text);
insert into t select generate_series(1,10000),'lottu'||generate_series(1,10000);
create table t1 as select * from t;
create table t2 as select * from t;
create index ind_t2_id on t2(id);

lottu=# analyze t1;
ANALYZE
lottu=# analyze t2;
ANALYZE
# 沒有索引
lottu=# explain (analyze,buffers,verbose) select * from t1 where id < 10;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Seq Scan on lottu.t1  (cost=0.00..180.00 rows=9 width=13) (actual time=0.073..5.650 rows=9 loops=1)
   Output: id, info
   Filter: (t1.id < 10)
   Rows Removed by Filter: 9991
   Buffers: shared hit=55
 Planning time: 25.904 ms
 Execution time: 5.741 ms
(7 rows)
# 有索引
lottu=# explain (analyze,verbose,buffers) select * from t2 where id < 10;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t2_id on lottu.t2  (cost=0.29..8.44 rows=9 width=13) (actual time=0.008..0.014 rows=9 loops=1)
   Output: id, info
   Index Cond: (t2.id < 10)
   Buffers: shared hit=3
 Planning time: 0.400 ms
 Execution time: 0.052 ms
(6 rows)

#在這個案例中：執行同一條SQL。t2有索引的執行數據是0.052 ms；t1沒有索引的是：5.741 ms; sql

1.2 有序的特性

索引自己就是有序的。數據庫

#沒有索引
lottu=# explain (analyze,verbose,buffers) select * from t1 where id > 2 order by id;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
Sort  (cost=844.31..869.31 rows=9999 width=13) (actual time=8.737..11.995 rows=9998 loops=1)
   Output: id, info
   Sort Key: t1.id
   Sort Method: quicksort  Memory: 853kB
   Buffers: shared hit=55
   ->  Seq Scan on lottu.t1  (cost=0.00..180.00 rows=9999 width=13) (actual time=0.038..5.133 rows=9998 loops=1)
         Output: id, info
         Filter: (t1.id > 2)
         Rows Removed by Filter: 2
         Buffers: shared hit=55
 Planning time: 0.116 ms
 Execution time: 15.205 ms
(12 rows)
 #有索引
lottu=# explain (analyze,verbose,buffers) select * from t2 where id > 2 order by id;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t2_id on lottu.t2  (cost=0.29..353.27 rows=9999 width=13) (actual time=0.030..5.304 rows=9998 loops=1)
   Output: id, info
   Index Cond: (t2.id > 2)
   Buffers: shared hit=84
 Planning time: 0.295 ms
 Execution time: 7.027 ms
(6 rows)

#在這個案例中：執行同一條SQL。express

t2有索引的執行數據是7.027 ms；t1沒有索引的是：15.205 ms;
t1沒有索引執行還佔用了 Memory: 853kB。

2. 索引掃描方式

索引的掃描方式有3種性能優化

2.1 Indexscan

先查索引找到匹配記錄的ctid，再經過ctid查堆表併發

2.2 bitmapscan

先查索引找到匹配記錄的ctid集合，把ctid經過bitmap作集合運算和排序後再查堆表oracle

2.3 Indexonlyscan

若是索引字段中包含了全部返回字段，對可見性映射 (vm)中全爲可見的數據塊，不查堆表直接返回索引中的值。oop

這裏談談Indexscan掃描方式和Indexonlyscan掃描方式
對這兩種掃描方式區別；借用oracle中索引掃描方式來說；Indexscan掃描方式會產生回表讀。根據上面解釋來講；Indexscan掃描方式：查完索引以後還須要查表。 Indexonlyscan掃描方式只須要查索引。也就是說：Indexonlyscan掃描方式要優於Indexscan掃描方式？咱們來看看post

現有表t；在字段id上面建來ind_t_id索引
1. t表沒有VM文件。
lottu=# \d+ t
                           Table "lottu.t"
 Column |  Type   | Modifiers | Storage  | Stats target | Description 
--------+---------+-----------+----------+--------------+-------------
 id     | integer |           | plain    |              | 
 info   | text    |           | extended |              | 
Indexes:
    "ind_t_id" btree (id)

lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Only Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.009..0.015 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Heap Fetches: 9
   Buffers: shared hit=3
 Planning time: 0.177 ms
 Execution time: 0.050 ms
(7 rows)
#人爲更改執行計劃
lottu=# set enable_indexonlyscan = off;
SET
lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.008..0.014 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Buffers: shared hit=3
 Planning time: 0.188 ms
 Execution time: 0.050 ms
(6 rows)
# 能夠發現二者幾乎沒有差別；惟一不一樣的是Indexonlyscan掃描方式存在掃描的Heap Fetches時間。 這個時間是不在Execution time裏面的。
2. t表有VM文件
lottu=# delete from t where id >200 and id < 500;
DELETE 299
lottu=# vacuum t;
VACUUM
lottu=# analyze t;
ANALYZE
lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Index Only Scan using ind_t_id on lottu.t  (cost=0.29..4.44 rows=9 width=4) (actual time=0.008..0.012 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Heap Fetches: 0
   Buffers: shared hit=3
 Planning time: 0.174 ms
 Execution time: 0.048 ms
(7 rows)

lottu=# set enable_indexonlyscan = off;
SET
lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.012..0.022 rows=9 loops=1)
   Output: id
   Index Cond: (t.id < 10)
   Buffers: shared hit=3
 Planning time: 0.179 ms
 Execution time: 0.077 ms
(6 rows)

總結：性能

Index Only Scan在沒有VM文件的狀況下, 速度比Index Scan要慢, 由於要掃描全部的Heap page。差別幾乎不大。
Index Only Scan存在VM文件的狀況下，是要比Index Scan要快。

知識點1：

VM文件：稱爲可見性映射文件；該文件存在表示：該數據塊沒有須要清理的行。即已經作了vaccum操做。

知識點2：

人爲選擇執行計劃。可設置enable_xxx參數有

enable_bitmapscan
enable_hashagg
enable_hashjoin
enable_indexonlyscan
enable_indexscan
enable_material
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan

參考文獻

參考德哥：《PostgreSQL 性能優化培訓 3 DAY.pdf》
https://www.postgresql.org/docs/9.6/static/runtime-config-query.html

3. 索引的類型

PostgreSQL 支持索引類型有: B-tree, Hash, GiST, SP-GiST, GIN and BRIN。

postgresql----Btree索引:http://www.cnblogs.com/alianbog/p/5621749.html
postgresql----hash索引：通常只用於簡單等值查詢。不經常使用。
postgresql----Gist索引:http://www.cnblogs.com/alianbog/p/5628543.html

4. 索引的管理

4.1 建立索引

建立索引語法：

lottu=# \h create index
Command:     CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]
    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ WITH ( storage_parameter = value [, ... ] ) ]
    [ TABLESPACE tablespace_name ]
    [ WHERE predicate ]
接下來咱們以t表爲例。    
1. 關鍵字【UNIQUE】
#建立惟一索引；主鍵就是一種惟一索引
CREATE UNIQUE INDEX ind_t_id_1 on t (id);
2. 關鍵字【CONCURRENTLY】
# 這是併發建立索引。跟oracle的online建立索引做用是同樣的。建立索引過程當中；不會阻塞表更新，插入，刪除操做。固然建立的時間就會很漫長。
CREATE INDEX CONCURRENTLY ind_t_id_2 on t (id);
3. 關鍵字【IF NOT EXISTS】
#用該命令是用於確認索引名是否存在。若存在；也不會報錯。
CREATE INDEX IF NOT EXISTS ind_t_id_3 on t (id);
4. 關鍵字【USING】
# 建立哪一種類型的索引。 默認是B-tree。
CREATE INDEX ind_t_id_4 on t using btree (id);
5 關鍵字【[ ASC | DESC ] [ NULLS { FIRST | LAST]】
# 建立索引是採用降序仍是升序。 若字段存在null值，是把null值放在前面仍是最後：例如採用降序，null放在前面。
CREATE INDEX ind_t_id_5 on t (id desc nulls first)
6. 關鍵字【WITH ( storage_parameter = value)】
#索引的填充因子設爲。例如建立索引的填充因子設爲75
CREATE INDEX ind_t_id_6 on t (id) with (fillfactor = 75);
7. 關鍵字【TABLESPACE】
#是把索引建立在哪一個表空間。
CREATE INDEX ind_t_id_7 on t (id) TABLESPACE tsp_lottu;
8. 關鍵字【WHERE】
#只在本身感興趣的那部分數據上建立索引，而不是對每一行數據都建立索引，此種方式建立索引就須要使用WHERE條件了。
CREATE INDEX ind_t_id_8 on t (id) WHERE id < 1000;

4.2 修改索引

修改索引語法

lottu=# \h alter index
Command:     ALTER INDEX
Description: change the definition of an index
Syntax:
#把索引從新命名
ALTER INDEX [ IF EXISTS ] name RENAME TO new_name
#把索引遷移表空間
ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name
#把索引重設置填充因子
ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter = value [, ... ] )
#把索引的填充因子設置爲默認值
ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )
#把表空間TSP1中索引遷移到新表空間
ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]
    SET TABLESPACE new_tablespace [ NOWAIT ]

4.3 刪除索引

刪除索引語法

lottu=# \h drop index
Command:     DROP INDEX
Description: remove an index
Syntax:
DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]

5. 索引的維護

索引能帶來加快對錶中記錄的查詢，排序，以及惟一約束的做用。索引也是有代價

索引須要增長數據庫的存儲空間。
在表記錄執行插入，更新，刪除操做。索引也要更新。

5.1 查看索引的大小

select pg_size_pretty(pg_relation_size('ind_t_id'));

5.2 索引的利用率

--經過pg_stat_user_indexes.idx_scan可檢查利用索引進行掃描的次數；這樣能夠確認那些索引能夠清理掉。
select idx_scan from pg_stat_user_indexes where indexrelname = 'ind_t_id';

5.3 索引的重建

--若是一個表通過頻繁更新以後，索引性能很差；須要重建索引。
lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); 
 pg_size_pretty 
----------------
 2200 kB
(1 row)

lottu=# delete from t where id > 1000;
DELETE 99000

lottu=# analyze t;
ANALYZE
lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); 
 pg_size_pretty 
----------------
 2200 kB
 
lottu=# insert into t select generate_series(2000,100000),'lottu';
INSERT 0 98001

lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); 
 pg_size_pretty 
----------------
 4336 kB
(1 row)

lottu=# vacuum full t;
VACUUM

lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1')); 
 pg_size_pretty 
----------------
 2176 kB
 
重建方法： 
1. reindex：reindex不支持並行重建【CONCURRENTLY】;索引會鎖表；會進行阻塞。
2. vacuum full; 對錶進行重構；索引也會重建；一樣也會鎖表。
3. 建立一個新索引(索引名不一樣)；再刪除舊索引。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。