騰訊雲TDSQL PostgreSQL版 - 最佳實踐 |優化實例

count(distinct xx) 優化
postgres=# CREATE TABLE t1(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
NOTICE: Replica identity is needed for shard table, please add to this table through "alter table" command.
CREATE TABLE
Time: 89.938 ms
postgres=# insert into t1 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000
Time: 14849.045 ms (00:14.849)
postgres=# analyze t1;
ANALYZE
Time: 1340.387 ms (00:01.340)
postgres=# explain (verbose) select count(distinct f2) from t1;網絡

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=103320.00..103320.01 rows=1 width=8)
Output: count(DISTINCT f2)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.00..100820.00 rows=1000000 width=33)ide

Output: f2
    ->  Seq Scan on public.t1  (cost=0.00..62720.00 rows=1000000 width=33)
          Output: f2

(6 rows)
Time: 0.748 ms
postgres=# select count(distinct f2) from t1;
count
---------
1000000
(1 row)
Time: 6274.684 ms (00:06.275)
postgres=# select count(distinct f2) from t1 where f1 <10;
count
-------oop

9

(1 row)
Time: 19.261 ms
如上 count(distinct f2) 發生在 cn 節點,對於 TP 類業務,須要操做的數據量少的狀況下,性能開銷沒有問題,且比下推執行的性能開銷還要小。
但對於一次要操做的數據量比較大的 AP 類業務,網絡傳輸就會瓶頸,以下是改寫後的執行計劃:post

postgres=# explain (verbose) select count(1) from (select f2 from t1 group by f2) as t ;性能

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=355600.70..355600.71 rows=1 width=8)
Output: count(1)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=355600.69..355600.70 rows=1 width=0)優化

Output: PARTIAL count(1)
    ->  Partial Aggregate  (cost=355500.69..355500.70 rows=1 width=8)
          Output: PARTIAL count(1)
          ->  Group  (cost=340500.69..345500.69 rows=1000000 width=33)
                Output: t1.f2
                Group Key: t1.f2
                ->  Sort  (cost=340500.69..343000.69 rows=1000000 width=0)
                      Output: t1.f2
                      Sort Key: t1.f2
                      ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=216192.84..226192.84 rows=1000000 width=0)
                            Output: t1.f2
                            Distribute results by S: f2
                            ->  Group  (cost=216092.84..221092.84 rows=1000000 width=33)
                                  Output: t1.f2
                                  Group Key: t1.f2
                                  ->  Sort  (cost=216092.84..218592.84 rows=1000000 width=33)
                                        Output: t1.f2
                                        Sort Key: t1.f2
                                        ->  Seq Scan on public.t1  (cost=0.00..62720.00 rows=1000000 width=33)
                                              Output: t1.f2

(23 rows)
改寫後,並行推到 dn 執行,此時查看執行的效果,能夠看到對於大量數據計算的 AP 類業務,性能提升了5倍。this

postgres=# select count(1) from (select f2 from t1 group by f2) as t ;
count
---------
1000000
(1 row)
Time: 1328.431 ms (00:01.328)
postgres=# select count(1) from (select f2 from t1 where f1<10 group by f2) as t ;
count
-------code

9

(1 row)
Time: 24.991 ms
postgres=#
增大 work_mem 減小 io 訪問
增大 work_mem 後,性能提升了40倍,由於 work_mem 足夠放下 filter 的數據,不須要再作 Materialize 物化,filter 由原來的 subplan 變成了 hash subplan,直接在內存 hash 表中 filter,性能提高。
注意,work_mem 默認不宜過大,建議在某個具體的查詢語句中再根據須要進行調整便可。排序

postgres=# CREATE TABLE t1(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
NOTICE: Replica identity is needed for shard table, please add to this table through "alter table" command.
CREATE TABLE
Time: 70.545 ms
postgres=# CREATE TABLE t2(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
NOTICE: Replica identity is needed for shard table, please add to this table through "alter table" command.
CREATE TABLE
Time: 61.913 ms
postgres=# insert into t1 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000) as t;
INSERT 0 1000
Time: 48.866 ms
postgres=# insert into t2 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,50000) as t;
INSERT 0 50000
Time: 792.858 ms
postgres=# analyze t1;
ANALYZE
Time: 175.946 ms
postgres=# analyze t2;
ANALYZE
Time: 318.802 ms
postgres=#
postgres=# explain select * from t1 where f2 not in (select f2 from t2);索引

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=0.00..2076712.50 rows=500 width=367)
-> Seq Scan on t1 (cost=0.00..2076712.50 rows=500 width=367)

Filter: (NOT (SubPlan 1))
    SubPlan 1
      ->  Materialize  (cost=0.00..4028.00 rows=50000 width=33)
            ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=0.00..3240.00 rows=50000 width=33)
                  ->  Seq Scan on t2  (cost=0.00..3240.00 rows=50000 width=33)

(7 rows)
Time: 0.916 ms
postgres=# select * from t1 where f2 not in (select f2 from t2);

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12

(0 rows)
Time: 4226.825 ms (00:04.227)
postgres=# set work_mem to '8MB';
SET
Time: 0.289 ms
postgres=# explain select * from t1 where f2 not in (select f2 from t2);

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=3365.00..3577.50 rows=500 width=367)
-> Seq Scan on t1 (cost=3365.00..3577.50 rows=500 width=367)

Filter: (NOT (hashed SubPlan 1))
    SubPlan 1
      ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=0.00..3240.00 rows=50000 width=33)
            ->  Seq Scan on t2  (cost=0.00..3240.00 rows=50000 width=33)

(6 rows)
Time: 0.890 ms
postgres=# select * from t1 where f2 not in (select f2 from t2);

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12

(0 rows)
Time: 105.249 ms
postgres=#
not in 改寫爲 anti join
上文經過增大計算內存提升性能,但內存不可能無限擴大,以下經過改寫語句來提升查詢的性能。

postgres=# explain select * from t1 left outer join t2 on t1.f2 = t2.f2 where t2.f2 is null;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=6405.00..9260.75 rows=1 width=734)
-> Hash Anti Join (cost=6405.00..9260.75 rows=1 width=734)

Hash Cond: (t1.f2 = t2.f2)
    ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.00..682.00 rows=1000 width=367)
          Distribute results by S: f2
          ->  Seq Scan on t1  (cost=0.00..210.00 rows=1000 width=367)
    ->  Hash  (cost=21940.00..21940.00 rows=50000 width=367)
          ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.00..21940.00 rows=50000 width=367)
                Distribute results by S: f2
                ->  Seq Scan on t2  (cost=0.00..3240.00 rows=50000 width=367)

(10 rows)
Time: 1.047 ms
postgres=# select * from t1 left outer join t2 on t1.f2 = t2.f2 where t2.f2 is null;

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12

(0 rows)
Time: 107.233 ms
postgres=#
也能夠修改 not exists:

postgres=# explain select * from t1 where not exists( select 1 from t2 where t1.f2=t2.f2);

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=3865.00..4078.75 rows=1 width=367)
-> Hash Anti Join (cost=3865.00..4078.75 rows=1 width=367)

Hash Cond: (t1.f2 = t2.f2)
    ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.00..682.00 rows=1000 width=367)
          Distribute results by S: f2
          ->  Seq Scan on t1  (cost=0.00..210.00 rows=1000 width=367)
    ->  Hash  (cost=5240.00..5240.00 rows=50000 width=33)
          ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.00..5240.00 rows=50000 width=33)
                Distribute results by S: f2
                ->  Seq Scan on t2  (cost=0.00..3240.00 rows=50000 width=33)

(10 rows)
Time: 0.974 ms
postgres=# select * from t1 where not exists( select 1 from t2 where t1.f2=t2.f2);

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12

(0 rows)
Time: 42.944 ms
postgres=#
分佈 key jon+limit 優化
數據準備:

postgres=# CREATE TABLE t1(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
NOTICE: Replica identity is needed for shard table, please add to this table through "alter table" command.
CREATE TABLE
postgres=# CREATE TABLE t2(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
NOTICE: Replica identity is needed for shard table, please add to this table through "alter table" command.
CREATE TABLE
postgres=# insert into t1 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000
postgres=# insert into t2 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE
postgres=#
postgres=# \timing
Timing is on.
postgres=# explain select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.25..1.65 rows=10 width=367)
-> Merge Join (cost=0.25..140446.26 rows=1000000 width=367)

Merge Cond: (t1.f1 = t2.f1)
    ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..434823.13 rows=1000000 width=367)
          ->  Index Scan using t1_f1_key on t1  (cost=0.12..62723.13 rows=1000000 width=367)
    ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..71823.13 rows=1000000 width=4)
          ->  Index Only Scan using t2_f1_key on t2  (cost=0.12..62723.13 rows=1000000 width=4)

(7 rows)
Time: 1.372 ms
postgres=# explain analyze select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.25..1.65 rows=10 width=367) (actual time=2675.437..2948.199 rows=10 loops=1)
-> Merge Join (cost=0.25..140446.26 rows=1000000 width=367) (actual time=2675.431..2675.508 rows=10 loops=1)

Merge Cond: (t1.f1 = t2.f1)
    ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..434823.13 rows=1000000 width=367) (actual time=1.661..1.704 rows=10 loops=1)
    ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..71823.13 rows=1000000 width=4) (actual time=2673.761..2673.783 rows=10 loops=1)

Planning time: 0.358 ms
Execution time: 2973.948 ms
(7 rows)
Time: 2976.008 ms (00:02.976)
postgres=#
以上執行計劃是在 cn 上執行,merge join 須要把要 join 的數據拉回 cn 再排序,而後再 join,這裏主切的開銷在於網絡,優化方法是讓語句推下去計算,以下所示,二者相差150倍的性能,通常狀況下,若是須要拉大量的數據回 cn 計算,則下推執行的效率會更好。

postgres=# set prefer_olap to on;
SET
Time: 0.291 ms
postgres=# explain select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.25..101.70 rows=10 width=367)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.25..101.70 rows=10 width=367)

->  Limit  (cost=0.25..1.65 rows=10 width=367)
          ->  Merge Join  (cost=0.25..140446.26 rows=1000000 width=367)
                Merge Cond: (t1.f1 = t2.f1)
                ->  Index Scan using t1_f1_key on t1  (cost=0.12..62723.13 rows=1000000 width=367)
                ->  Index Only Scan using t2_f1_key on t2  (cost=0.12..62723.13 rows=1000000 width=4)

(7 rows)
Time: 1.061 ms
postgres=# explain analyze select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.25..101.70 rows=10 width=367) (actual time=1.527..3.899 rows=10 loops=1)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.25..101.70 rows=10 width=367) (actual time=1.525..1.529 rows=10 loops=1)
Planning time: 0.360 ms
Execution time: 18.193 ms
(4 rows)
Time: 19.921 ms
非分佈 key join 使用 hash join 性能通常更好
爲提升 TP 類業務查詢的性能,常常須要對一些字段創建索引,使用有索引字段 join 時,系統每每也會使用 Merge Cond 和 nestloop。

mydb=# CREATE TABLE t1(f1 serial not null,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
NOTICE: Replica identity is needed for shard table, please add to this table through "alter table" command.
CREATE TABLE
Time: 481.042 ms
mydb=# create index t1_f1_idx on t1(f2);
CREATE INDEX
Time: 85.521 ms
mydb=# CREATE TABLE t2(f1 serial not null,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
NOTICE: Replica identity is needed for shard table, please add to this table through "alter table" command.
CREATE TABLE
Time: 75.973 ms
mydb=# create index t2_f1_idx on t2(f2);
CREATE INDEX
Time: 29.890 ms
mydb=# insert into t1 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000
Time: 16450.623 ms (00:16.451)
mydb=# insert into t2 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000
Time: 17218.738 ms (00:17.219)
mydb=# analyze t1;
ANALYZE
Time: 2219.341 ms (00:02.219)
mydb=# analyze t2;
ANALYZE
Time: 1649.506 ms (00:01.650)
mydb=#
--merge join
mydb=# explain select t1.* from t1,t2 where t1.f2=t2.f2 limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.25..102.78 rows=10 width=367)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.25..102.78 rows=10 width=367)

->  Limit  (cost=0.25..2.73 rows=10 width=367)
          ->  Merge Join  (cost=0.25..248056.80 rows=1000000 width=367)
                Merge Cond: (t1.f2 = t2.f2)
                ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.12..487380.85 rows=1000000 width=367)
                      Distribute results by S: f2
                      ->  Index Scan using t1_f1_idx on t1  (cost=0.12..115280.85 rows=1000000 width=367)
                ->  Materialize  (cost=100.12..155875.95 rows=1000000 width=33)
                      ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.12..153375.95 rows=1000000 width=33)
                            Distribute results by S: f2
                            ->  Index Only Scan using t2_f1_idx on t2  (cost=0.12..115275.95 rows=1000000 width=33)

(12 rows)
Time: 4.183 ms
mydb=# explain analyze select t1.* from t1,t2 where t1.f2=t2.f2 limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.25..102.78 rows=10 width=367) (actual time=6555.346..6556.296 rows=10 loops=1)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.25..102.78 rows=10 width=367) (actual time=6555.343..6555.349 rows=10 loops=1)
Planning time: 0.473 ms
Execution time: 6569.828 ms
(4 rows)
Time: 6614.439 ms (00:06.614)
--nested loop
mydb=# set enable_mergejoin to off;
SET
Time: 0.422 ms
mydb=# explain select t1.* from t1,t2 where t1.f2=t2.f2 limit 10;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------
Limit (cost=100.12..103.57 rows=10 width=367)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.12..103.57 rows=10 width=367)

->  Limit  (cost=0.12..3.52 rows=10 width=367)
          ->  Nested Loop  (cost=0.12..339232.00 rows=1000000 width=367)
                ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.00..434740.00 rows=1000000 width=367)
                      Distribute results by S: f2
                      ->  Seq Scan on t1  (cost=0.00..62640.00 rows=1000000 width=367)
                ->  Materialize  (cost=100.12..100.31 rows=1 width=33)
                      ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.12..100.30 rows=1 width=33)
                            Distribute results by S: f2
                            ->  Index Only Scan using t2_f1_idx on t2  (cost=0.12..0.27 rows=1 width=33)
                                  Index Cond: (f2 = t1.f2)

(12 rows)
Time: 1.033 ms
mydb=# explain analyze select t1.* from t1,t2 where t1.f2=t2.f2 limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.12..103.57 rows=10 width=367) (actual time=5608.326..5609.571 rows=10 loops=1)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.12..103.57 rows=10 width=367) (actual time=5608.323..5608.349 rows=10 loops=1)
Planning time: 0.347 ms
Execution time: 5669.901 ms
(4 rows)
Time: 5672.584 ms (00:05.673)
mydb=# set enable_nestloop to off;
SET
Time: 0.436 ms
mydb=# explain select t1.* from t1,t2 where t1.f2=t2.f2 limit 10;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Limit (cost=85983.00..85984.94 rows=10 width=367)
-> Remote Subquery Scan on all (dn001,dn002) (cost=85983.00..85984.94 rows=10 width=367)

->  Limit  (cost=85883.00..85884.89 rows=10 width=367)
          ->  Hash Join  (cost=85883.00..274580.00 rows=1000000 width=367)
                 Hash Cond: (t1.f2 = t2.f2)
                ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.00..434740.00 rows=1000000 width=367)
                      Distribute results by S: f2
                      ->  Seq Scan on t1  (cost=0.00..62640.00 rows=1000000 width=367)
                ->  Hash  (cost=100740.00..100740.00 rows=1000000 width=33)
                      ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.00..100740.00 rows=1000000 width=33)
                            Distribute results by S: f2
                            ->  Seq Scan on t2  (cost=0.00..62640.00 rows=1000000 width=33)

(12 rows)
Time: 1.141 ms
mydb=# explain analyze select t1.* from t1,t2 where t1.f2=t2.f2 limit 10;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=85983.00..85984.94 rows=10 width=367) (actual time=1083.691..1085.962 rows=10 loops=1)
-> Remote Subquery Scan on all (dn001,dn002) (cost=85983.00..85984.94 rows=10 width=367) (actual time=1083.688..1083.699 rows=10 loops=1)
Planning time: 0.530 ms
Execution time: 1108.830 ms
(4 rows)
Time: 1117.713 ms (00:01.118)
mydb=#
exists 優化
exists 在數據量比較大狀況下,通常使用的是 Semi Join ,在 work_mem 足夠大的狀況下使用的是 hash join,性能會更好。以下,性能提高大約一倍。

postgres=# show work_mem;
work_mem
----------
4MB
(1 row)
Time: 0.298 ms
postgres=# explain select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1);

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=242218.32..242218.33 rows=1 width=8)
-> Remote Subquery Scan on all (dn001,dn002) (cost=242218.30..242218.32 rows=1 width=0)

->  Partial Aggregate  (cost=242118.30..242118.31 rows=1 width=8)
          ->  Hash Semi Join  (cost=110248.00..242118.30 rows=505421 width=0)
                Hash Cond: (t1.f1 = t2.t1_f1)
                ->  Seq Scan on t1  (cost=0.00..17420.00 rows=1000000 width=4)
                ->  Hash  (cost=79340.00..79340.00 rows=3000000 width=4)
                      ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.00..79340.00 rows=3000000 width=4)
                            Distribute results by S: t1_f1
                            ->  Seq Scan on t2  (cost=0.00..52240.00 rows=3000000 width=4)

(10 rows)
Time: 1.091 ms
postgres=# select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1);
count
--------
500000
(1 row)
Time: 3779.401 ms (00:03.779)
postgres=# set work_mem to '128MB';
SET
Time: 0.368 ms
postgres=# explain select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1);

QUERY PLAN

------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=101763.76..101763.77 rows=1 width=8)
-> Remote Subquery Scan on all (dn001,dn002) (cost=101763.75..101763.76 rows=1 width=0)

->  Partial Aggregate  (cost=101663.75..101663.76 rows=1 width=8)
          ->  Hash Join  (cost=89660.00..101663.75 rows=505421 width=0)
                Hash Cond: (t2.t1_f1 = t1.f1)
                ->  Remote Subquery Scan on all (dn001,dn002)  (cost=59840.00..69443.00 rows=505421 width=4)
                      Distribute results by S: t1_f1
                      ->  HashAggregate  (cost=59740.00..64794.21 rows=505421 width=4)
                            Group Key: t2.t1_f1
                            ->  Seq Scan on t2  (cost=0.00..52240.00 rows=3000000 width=4)
                ->  Hash  (cost=17420.00..17420.00 rows=1000000 width=4)
                      ->  Seq Scan on t1  (cost=0.00..17420.00 rows=1000000 width=4)

(12 rows) Time: 4.739 mspostgres=# select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1); count --------500000(1 row) Time: 1942.037 ms (00:01.942)postgres=#

相關文章
相關標籤/搜索