經歷了前兩輪優化以後,saiku由不可以使用,優化到可使用,不過在分析大量日誌數據的時候,還有頓卡的感受!繼續觀察背後執行的Sql,決定將注意力關注到索引上面!sql
日誌的主要使用場景是:固定日期維度的數據分析,也就是說where條件必定跟着日期等於某一天,那麼糾結的是:每一個字段都創建索引,仍是和日期創建聯合索引。歸結到底就是單個字段的索引效率與聯合索引的效率優劣對比!測試
Postgresql數據表:saiku_search_detail優化
表結構:spa
CREATE TABLE test.saiku_search_detail ( rpt_date date, from_area_id bigint, from_value_id bigint, in_track_id bigint, gid character varying, current_city_id bigint, dist_city_id bigint, category_name_id bigint, page_id bigint, utmr_page_id bigint, num bigint, id bigint, partner smallint )
條數:8,510,490。大概851萬日誌
對一個日期進行查詢:code
1.1 單個條件blog
select count(1) from test.saiku_search_detail where rpt_date = '2016-05-13'
結果:1110ms索引
"Aggregate (cost=160934.85..160934.86 rows=1 width=0)" " -> Seq Scan on saiku_search_detail (cost=0.00..160816.78 rows=47230 width=0)" " Filter: (rpt_date = '2016-05-13'::date)"
1.2 兩個條件ci
select count(1) from test.saiku_search_detail where rpt_date = '2016-05-13' and from_area_id = 135
結果:1782ms數據分析
"Aggregate (cost=184432.32..184432.33 rows=1 width=0)" " -> Seq Scan on saiku_search_detail (cost=0.00..184431.73 rows=236 width=0)" " Filter: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))"
沒有任何異議,0個索引!
--btree索引 CREATE INDEX saiku_search_detail_from_area_id_idx ON saiku_search_detail USING btree (from_area_id); --hash索引 CREATE INDEX saiku_search_detail_rpt_date_idx ON saiku_search_detail USING hash (rpt_date);
2.1 單個條件
select count(1) from saiku_search_detail where rpt_date = '2016-05-13'
結果:83ms
"Aggregate (cost=8.02..8.03 rows=1 width=0)" " -> Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail (cost=0.00..8.02 rows=1 width=0)" " Index Cond: (rpt_date = '2016-05-13'::date)"
使用了索引
2.2 兩個條件
select count(1) from saiku_search_detail where rpt_date = '2016-05-13' and from_area_id = 135
結果:149ms
"Aggregate (cost=8.02..8.03 rows=1 width=0)" " -> Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail (cost=0.00..8.02 rows=1 width=0)" " Index Cond: (rpt_date = '2016-05-13'::date)" " Filter: (from_area_id = 135)"
使用了一個索引,第二個索引沒有生效。嘗試修改sql的條件順序:
select count(1) from saiku_search_detail where from_area_id = 135 and rpt_date = '2016-05-13'
結果同樣!這說明在Postgresql裏面,創建兩個索引字段,只會一個起做用!
--複合索引,兩個字段都添加索引 CREATE INDEX saiku_search_detail_rpt_date_from_area_idx ON test.saiku_search_detail USING btree (rpt_date, from_area_id);
3.1 單個條件查詢&創建索引的第一個字段
select count(1) from test.saiku_search_detail where rpt_date = '2016-05-13'
結果:66ms
"Aggregate (cost=47843.00..47843.01 rows=1 width=0)" " -> Bitmap Heap Scan on saiku_search_detail (cost=2220.63..47362.94 rows=192025 width=0)" " Recheck Cond: (rpt_date = '2016-05-13'::date)" " -> Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx (cost=0.00..2172.62 rows=192025 width=0)"
可見使用了部分索引
3.2 兩個條件查詢
select count(1) from test.saiku_search_detail where rpt_date = '2016-05-13' and from_area_id = 135
結果:65ms
"Aggregate (cost=46124.99..46125.00 rows=1 width=0)" " -> Bitmap Heap Scan on saiku_search_detail (cost=1509.67..45857.37 rows=107047 width=0)" " Recheck Cond: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))" " -> Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx (cost=0.00..1482.90 rows=107047 width=0)"
使用了索引