saiku之行速度優化(三)

經歷了前兩輪優化以後,saiku由不可以使用,優化到可使用,不過在分析大量日誌數據的時候,還有頓卡的感受!繼續觀察背後執行的Sql,決定將注意力關注到索引上面!sql

日誌的主要使用場景是:固定日期維度的數據分析,也就是說where條件必定跟着日期等於某一天,那麼糾結的是:每一個字段都創建索引,仍是和日期創建聯合索引。歸結到底就是單個字段的索引效率與聯合索引的效率優劣對比!測試

Postgresql數據表:saiku_search_detail優化

表結構:spa

CREATE TABLE test.saiku_search_detail
(
  rpt_date date,
  from_area_id bigint,
  from_value_id bigint,
  in_track_id bigint,
  gid character varying,
  current_city_id bigint,
  dist_city_id bigint,
  category_name_id bigint,
  page_id bigint,
  utmr_page_id bigint,
  num bigint,
  id bigint,
  partner smallint
)

條數:8,510,490。大概851萬日誌

測試步驟:

1、裸表

對一個日期進行查詢:code

1.1 單個條件blog

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'

結果:1110ms索引

"Aggregate  (cost=160934.85..160934.86 rows=1 width=0)"
"  ->  Seq Scan on saiku_search_detail  (cost=0.00..160816.78 rows=47230 width=0)"
"        Filter: (rpt_date = '2016-05-13'::date)"

1.2 兩個條件ci

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

結果:1782ms數據分析

"Aggregate  (cost=184432.32..184432.33 rows=1 width=0)"
"  ->  Seq Scan on saiku_search_detail  (cost=0.00..184431.73 rows=236 width=0)"
"        Filter: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))"

沒有任何異議,0個索引!

2、對兩個字段分別添加索引:

--btree索引
CREATE INDEX saiku_search_detail_from_area_id_idx
  ON saiku_search_detail
  USING btree
  (from_area_id);
--hash索引
CREATE INDEX saiku_search_detail_rpt_date_idx
  ON saiku_search_detail
  USING hash
  (rpt_date);

2.1 單個條件

select
  count(1)
from saiku_search_detail
where rpt_date = '2016-05-13'

結果:83ms

"Aggregate  (cost=8.02..8.03 rows=1 width=0)"
"  ->  Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail  (cost=0.00..8.02 rows=1 width=0)"
"        Index Cond: (rpt_date = '2016-05-13'::date)"

使用了索引

2.2 兩個條件

select
  count(1)
from saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

結果:149ms

"Aggregate  (cost=8.02..8.03 rows=1 width=0)"
"  ->  Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail  (cost=0.00..8.02 rows=1 width=0)"
"        Index Cond: (rpt_date = '2016-05-13'::date)"
"        Filter: (from_area_id = 135)"

使用了一個索引,第二個索引沒有生效。嘗試修改sql的條件順序:

select
  count(1)
from saiku_search_detail
where from_area_id = 135
and rpt_date = '2016-05-13'

結果同樣!這說明在Postgresql裏面,創建兩個索引字段,只會一個起做用!

3、創建聯合索引

--複合索引,兩個字段都添加索引
CREATE INDEX saiku_search_detail_rpt_date_from_area_idx
  ON test.saiku_search_detail
  USING btree
  (rpt_date, from_area_id);
  

3.1 單個條件查詢&創建索引的第一個字段

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'

結果:66ms

"Aggregate  (cost=47843.00..47843.01 rows=1 width=0)"
"  ->  Bitmap Heap Scan on saiku_search_detail  (cost=2220.63..47362.94 rows=192025 width=0)"
"        Recheck Cond: (rpt_date = '2016-05-13'::date)"
"        ->  Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx  (cost=0.00..2172.62 rows=192025 width=0)"

可見使用了部分索引

3.2 兩個條件查詢

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

結果:65ms

"Aggregate  (cost=46124.99..46125.00 rows=1 width=0)"
"  ->  Bitmap Heap Scan on saiku_search_detail  (cost=1509.67..45857.37 rows=107047 width=0)"
"        Recheck Cond: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))"
"        ->  Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx  (cost=0.00..1482.90 rows=107047 width=0)"

使用了索引

總結

  • 廢話:若是兩個字段作爲篩選條件,那麼聯合索引最優。
  • 收益:在日誌分析過程當中,除了日期的單個字段作爲索引,其餘的單個字段索引都不起做用,應該刪除
  • 糾結:僅僅在日期創建單個索引,仍是創建多個包含日期的複合索引?根據使用場景本身決定吧
相關文章
相關標籤/搜索