PostgreSQL 數據庫NULL值的默認排序行爲與查詢、索引定義規範 - nulls first\last, asc\desc

時間 2019-12-11

標籤 postgresql 數據庫 null 默認排序行爲查詢索引定義規範 nulls asc desc 欄目 Postgre SQL 简体版

原文原文鏈接

背景

在數據庫中NULL值是指UNKNOWN的值，不存儲任何值，在排序時，它排在有值的行前面仍是後面經過語法來指定。git

例如github

-- 表示null排在有值行的前面  
select * from tbl order by id nulls first;  
  
-- 表示null排在有值行的後面  
select * from tbl order by id nulls last;

同時對於有值行，能夠指定順序排仍是倒序排。數據庫

-- 表示按ID列順序排  
select * from tbl order by id [asc];  
  
-- 表示按ID列倒序排  
select * from tbl order by id desc;

默認的排序規則以下：oop

desc nulls first : null large small    
  
asc nulls last : small large null

當nulls [first|last]與asc|desc組合起來用時，是這樣的。post

值的順序以下：測試

一、DEFAULT:（認爲NULL比任意值都大）優化

desc nulls first : 順序：null large small    
  
asc nulls last   : 順序：small large null

二、NON DEFAULT: （認爲NULL比任意值都小）code

desc nulls last : 順序：large small null       
  
asc nulls first : 順序：null small large

因爲索引是固定的，當輸入排序條件時，若是排序條件與索引的排序規則不匹配時，會致使沒法使用索引的實惠（順序掃描）。致使一些沒必要要的麻煩。blog

索引定義與掃描定義不一致引起的問題

一、建表，輸入測試數據排序

create table cc(id int not null);  
  
insert into cc select generate_series(1,1000000);

二、創建索引（使用非默認配置，null比任意值小）

create index idx_cc on cc (id asc nulls first);  
  
或  
  
create index idx_cc on cc (id desc nulls last);

三、查詢，與索引定義的順序（指NULL的相對位置）不一致時，即便使用索引，也須要從新SORT。

select * from table order by id desc nulls first limit 1;   
select * from table order by id [asc] nulls last limit 1;

用到了額外的SORT

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cc order by id limit 1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=27969.43..27969.43 rows=1 width=4) (actual time=263.972..263.972 rows=1 loops=1)  
   Output: id  
   Buffers: shared hit=7160  
   ->  Sort  (cost=27969.43..30469.43 rows=1000000 width=4) (actual time=263.970..263.970 rows=1 loops=1)  
         Output: id  
         Sort Key: cc.id  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=7160  
         ->  Bitmap Heap Scan on public.cc  (cost=8544.42..22969.42 rows=1000000 width=4) (actual time=29.927..148.733 rows=1000000 loops=1)  
               Output: id  
               Heap Blocks: exact=4425  
               Buffers: shared hit=7160  
               ->  Bitmap Index Scan on idx_cc  (cost=0.00..8294.42 rows=1000000 width=0) (actual time=29.380..29.380 rows=1000000 loops=1)  
                     Buffers: shared hit=2735  
 Planning time: 0.098 ms  
 Execution time: 264.009 ms  
(16 rows)

三、查詢，與索引定義一致（指NULL的相對位置）時，索引有效，不須要額外SORT。

select * from table order by id desc nulls last limit 1;   
select * from table order by id [asc] nulls first limit 1;

不須要額外SORT

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cc order by id nulls first limit 1;  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.42..0.45 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)  
   Output: id  
   Buffers: shared hit=4  
   ->  Index Only Scan using idx_cc on public.cc  (cost=0.42..22719.62 rows=1000000 width=4) (actual time=0.013..0.013 rows=1 loops=1)  
         Output: id  
         Heap Fetches: 1  
         Buffers: shared hit=4  
 Planning time: 0.026 ms  
 Execution time: 0.022 ms  
(9 rows)

小結

在PostgreSQL中順序、倒序索引是通用的。不一樣的是null的相對位置。

所以在建立索引時，務必與業務的需求對齊，使用一致的NULL相對順序(nulls first 或 nulls last 與asc,desc的搭配)（即NULL挨着large value仍是small value），而至於值的asc, desc其實是無所謂的。

若是業務需求的順序與索引的順序不一致（指null的相對順序），那麼會致使索引須要全掃，從新SORT的問題。

內核改進

一、當約束設置了not null時，應該能夠不care null的相對位置，由於都沒有NULL值了，優化器應該能夠無論NULL的相對位置是否與業務請求的SQL的一致性，都選擇非Sort模式掃描。

二、改進索引掃描方法，支持環形掃描。

參考：
https://github.com/digoal/blog/blob/master/201711/20171111_02.md

注：

若是建立索引時，沒有指定null的內容，但where條件部分又使用到了null的排序，那麼要將asc|desc 與 last|first對應好，默認對應的操做是：

desc nulls first : null large small    
  
asc nulls last : small large null

在沒有指定null的索引中，按照上面方法對應好便可。
下面是幾個測試：

swrd=# \d cc
       Table "swrd.cc"
 Column |  Type   | Modifiers 
--------+---------+-----------
 id     | integer | not null
Indexes:
    "cc_id_idx" btree (id)
swrd=# explain (analyze,verbose,timing,costs,buffers)  select * from cc order by id  desc nulls first;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan Backward using cc_id_idx on swrd.cc  (cost=0.42..30408.42 rows=1000000 width=4) (actual time=0.044..297.796 rows=1000000 loops=1)
   Output: id
   Heap Fetches: 1000000
   Buffers: shared hit=7159 read=1
 Planning time: 0.113 ms
 Execution time: 387.645 ms
(6 rows)

Time: 388.438 ms
swrd=# explain (analyze,verbose,timing,costs,buffers)  select * from cc order by id  desc nulls last;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=127757.34..130257.34 rows=1000000 width=4) (actual time=666.996..926.348 rows=1000000 loops=1)
   Output: id
   Sort Key: cc.id DESC NULLS LAST
   Sort Method: external merge  Disk: 13640kB
   Buffers: shared hit=4425, temp read=2334 written=2334
   ->  Seq Scan on swrd.cc  (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.020..147.384 rows=1000000 loops=1)
         Output: id
         Buffers: shared hit=4425
 Planning time: 0.110 ms
 Execution time: 1027.649 ms
(10 rows)