saiku之行速度優化（三）

時間 2019-11-06

標籤 saiku 之行速度優化简体版

原文原文鏈接

經歷了前兩輪優化以後，saiku由不可以使用，優化到可使用，不過在分析大量日誌數據的時候，還有頓卡的感受！繼續觀察背後執行的Sql，決定將注意力關注到索引上面！sql

日誌的主要使用場景是：固定日期維度的數據分析，也就是說where條件必定跟着日期等於某一天，那麼糾結的是：每一個字段都創建索引，仍是和日期創建聯合索引。歸結到底就是單個字段的索引效率與聯合索引的效率優劣對比！測試

Postgresql數據表：saiku_search_detail優化

表結構：spa

CREATE TABLE test.saiku_search_detail
(
  rpt_date date,
  from_area_id bigint,
  from_value_id bigint,
  in_track_id bigint,
  gid character varying,
  current_city_id bigint,
  dist_city_id bigint,
  category_name_id bigint,
  page_id bigint,
  utmr_page_id bigint,
  num bigint,
  id bigint,
  partner smallint
)

條數：8,510,490。大概851萬日誌

測試步驟：

1、裸表

對一個日期進行查詢：code

1.1 單個條件blog

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'

結果：1110ms索引

"Aggregate  (cost=160934.85..160934.86 rows=1 width=0)"
"  ->  Seq Scan on saiku_search_detail  (cost=0.00..160816.78 rows=47230 width=0)"
"        Filter: (rpt_date = '2016-05-13'::date)"

1.2 兩個條件ci

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

結果：1782ms數據分析

"Aggregate  (cost=184432.32..184432.33 rows=1 width=0)"
"  ->  Seq Scan on saiku_search_detail  (cost=0.00..184431.73 rows=236 width=0)"
"        Filter: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))"

沒有任何異議，0個索引！

2、對兩個字段分別添加索引：

--btree索引
CREATE INDEX saiku_search_detail_from_area_id_idx
  ON saiku_search_detail
  USING btree
  (from_area_id);
--hash索引
CREATE INDEX saiku_search_detail_rpt_date_idx
  ON saiku_search_detail
  USING hash
  (rpt_date);

2.1 單個條件

select
  count(1)
from saiku_search_detail
where rpt_date = '2016-05-13'

結果：83ms

"Aggregate  (cost=8.02..8.03 rows=1 width=0)"
"  ->  Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail  (cost=0.00..8.02 rows=1 width=0)"
"        Index Cond: (rpt_date = '2016-05-13'::date)"

使用了索引

2.2 兩個條件

select
  count(1)
from saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

結果：149ms

"Aggregate  (cost=8.02..8.03 rows=1 width=0)"
"  ->  Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail  (cost=0.00..8.02 rows=1 width=0)"
"        Index Cond: (rpt_date = '2016-05-13'::date)"
"        Filter: (from_area_id = 135)"

使用了一個索引，第二個索引沒有生效。嘗試修改sql的條件順序：

select
  count(1)
from saiku_search_detail
where from_area_id = 135
and rpt_date = '2016-05-13'

結果同樣！這說明在Postgresql裏面，創建兩個索引字段，只會一個起做用！

3、創建聯合索引

--複合索引，兩個字段都添加索引
CREATE INDEX saiku_search_detail_rpt_date_from_area_idx
  ON test.saiku_search_detail
  USING btree
  (rpt_date, from_area_id);

3.1 單個條件查詢&創建索引的第一個字段

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'

結果：66ms

"Aggregate  (cost=47843.00..47843.01 rows=1 width=0)"
"  ->  Bitmap Heap Scan on saiku_search_detail  (cost=2220.63..47362.94 rows=192025 width=0)"
"        Recheck Cond: (rpt_date = '2016-05-13'::date)"
"        ->  Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx  (cost=0.00..2172.62 rows=192025 width=0)"

可見使用了部分索引

3.2 兩個條件查詢

select
  count(1)
from test.saiku_search_detail
where rpt_date = '2016-05-13'
and from_area_id = 135

結果：65ms

"Aggregate  (cost=46124.99..46125.00 rows=1 width=0)"
"  ->  Bitmap Heap Scan on saiku_search_detail  (cost=1509.67..45857.37 rows=107047 width=0)"
"        Recheck Cond: ((rpt_date = '2016-05-13'::date) AND (from_area_id = 135))"
"        ->  Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx  (cost=0.00..1482.90 rows=107047 width=0)"