有model Account
,SQLAlchemy 查詢語句以下:html
query = Account.query.filter(Account.id.in_(account_ids)).order_by(Account.date_created.desc())
這裏 uids 若是爲空,執行查詢會有以下警告:python
/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/default_comparator.py:35: SAWarning: The IN-predicate on "account.id" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance. return o[0](self, self.expr, op, *(other + o[1:]), **kwargs)
這裏的意思是使用一個空的列表會花費較長的時間,須要優化以提升性能。
爲何會有這個提示呢?一個空列表爲何會影響性能呢?linux
首先打印 query 可獲得以下 sql 語句:sql
SELECT * // 字段使用 「*」 代替 FROM account WHERE account.id != account.id ORDER BY account.date_created DESC
會發現生成的語句中過濾條件是 WHERE account.id != account.id
,使用 PostgreSQL Explain ANALYZE 命令
,數據庫
分析查詢成本結果以下:bash
postgres=> EXPLAIN ANALYZE SELECT * FROM account WHERE account.id != account.id ORDER BY account.date_created DESC; QUERY PLAN ---------------------------------------------------------------------------------- Sort (cost=797159.14..808338.40 rows=4471702 width=29) (actual time=574.002..574.002 rows=0 loops=1) Sort Key: date_created DESC Sort Method: quicksort Memory: 25kB -> Seq Scan on account (cost=0.00..89223.16 rows=4471702 width=29) (actual time=573.991..573.991 rows=0 loops=1) Filter: (id <> id) Rows Removed by Filter: 4494173 Planning time: 0.162 ms Execution time: 574.052 ms (8 rows)
先看Postgresql提供的語句生成的執行計劃,經過結果能夠看到,雖然返回值爲空,可是查詢成本卻仍是特別高,執行計劃部分幾乎全部的時間都耗費在排序上,可是和執行時間相比,查詢計劃的時間能夠忽略不計。(結果是先遍歷全表,查出全部數據,而後再使用 Filter: (id <> id)
把全部數據過濾。)less
按照這個思路,有兩種查詢方案:dom
1.若是 account_ids 爲空,那麼直接返回空列表不進行任何操做,查詢語句變爲:python2.7
if account_ids: query = Account.query.filter(Account.id.in_(account_ids)).order_by(Account.date_created.desc())
2.若是 account_ids 爲空,那麼過濾方式,查詢語句變爲:ide
query = Account.query if account_ids: query = query.filter(Account.id.in_(account_ids)) else: query = query.filter(False) query = query.order_by(Account.date_created.desc())
若是 account_ids 爲空,此時生成的 SQL 語句結果爲:
SELECT * FROM account WHERE 0 = 1 ORDER BY account.date_created DESC
分析結果爲:
postgres=> EXPLAIN ANALYZE SELECT * FROM account WHERE 0 = 1 ORDER BY account.date_created DESC; QUERY PLAN --------------------------------------------------------------------------------------------------- Sort (cost=77987.74..77987.75 rows=1 width=29) (actual time=0.011..0.011 rows=0 loops=1) Sort Key: date_created DESC Sort Method: quicksort Memory: 25kB -> Result (cost=0.00..77987.73 rows=1 width=29) (actual time=0.001..0.001 rows=0 loops=1) One-Time Filter: false -> Seq Scan on account (cost=0.00..77987.73 rows=1 width=29) (never executed) Planning time: 0.197 ms Execution time: 0.061 ms (8 rows)
能夠看到,查詢計劃和執行時間都有大幅提升。
若是隻是去掉方案1排序,查看一下分析結果
使用 PostgreSQL Explain ANALYZE 命令
分析查詢成本結果以下:
postgres=> EXPLAIN ANALYZE SELECT * FROM account WHERE account.id != account.id; QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on account (cost=0.00..89223.16 rows=4471702 width=29) (actual time=550.999..550.999 rows=0 loops=1) Filter: (id <> id) Rows Removed by Filter: 4494173 Planning time: 0.134 ms Execution time: 551.041 ms
能夠看到,時間和有排序時差異不大。
執行一個分析,結果以下:
postgres=> explain select * from account where date_created ='2016-04-07 18:51:30.371495+08'; QUERY PLAN -------------------------------------------------------------------------------------- Seq Scan on account (cost=0.00..127716.33 rows=1 width=211) Filter: (date_created = '2016-04-07 18:51:30.371495+08'::timestamp with time zone) (2 rows)
EXPLAIN引用的數據是:
這裏開銷(cost)的計算單位是磁盤頁面的存取數量,如1.0將表示一次順序的磁盤頁面讀取。其中上層節點的開銷將包括其全部子節點的開銷。這裏的輸出行數(rows)並非規劃節點處理/掃描的行數,一般會更少一些。通常而言,頂層的行預計數量會更接近於查詢實際返回的行數。
這裏表示的就是在只有單 CPU 內核的狀況下,評估成本是127716.33;
這裏 account 表的大小爲:
postgres=> select pg_relation_size('account'); pg_relation_size ------------------ 737673216 (1 row)
Postgresql 會爲每一個要一次讀取的快添加成本點,使用 show block_size
查看塊的大小:
postgres=> show block_size; block_size ------------ 8192 (1 row)
能夠看到每一個塊的大小爲8kb,那麼能夠計算從表從讀取的順序塊成本值爲:
blocks = pg_relation_size/block_size = 90048
90048
是account 表所佔用塊的數量。
postgres=> show seq_page_cost; seq_page_cost --------------- 1 (1 row)
這裏的意思是 Postgresql 爲每一個塊分配一個成本點,也就是說上面的查詢須要從90048個成本點。
postgres=> show cpu_operator_cost; cpu_operator_cost ------------------- 0.0025 (1 row) postgres=> show cpu_tuple_cost; cpu_tuple_cost ---------------- 0.01 (1 row)
cost 計算公式爲:
cost = 磁盤塊個數 塊成本(1) + 行數 cpu_tuple_cost(系統參數值)+ 行數 * cpu_operator_cost
如今用全部值來計算explain 語句中獲得的值:
number_of_records = 3013466 # account 表 count block_size = 8192 # block size in bytes pg_relation_size=737673216 blocks = pg_relation_size/block_size = 90048 seq_page_cost = 1 cpu_tuple_cost = 0.01 cpu_operator_cost = 0.0025 cost = blocks * seq_page_cost + number_of_records * cpu_tuple_cost + number_of_records * cpu_operator_cost
直接回答,使用索引。
postgres=> explain select * from account where id=20039; QUERY PLAN ---------------------------------------------------------------------------------------- Index Scan using account_pkey on account (cost=0.43..8.45 rows=1 width=211) Index Cond: (id = 20039) (2 rows)
經過這個查詢能夠看到,在使用有索引的字段查詢時,查詢成本顯著下降。
索引掃描的計算比順序掃描的計算要複雜一些。它由兩個階段組成。
PostgreSQL會考慮random_page_cost和cpu_index_tuple_cost 變量,並返回一個基於索引樹的高度的值。