ElasticSearch java API - 聚合查詢

以球員信息爲例,player索引的player type包含5個字段,姓名,年齡,薪水,球隊,場上位置。java

index的mapping爲:api

"mappings": {  
    "quote": {  
        "properties": {  
            "adj_close": {  
                "type": "long"  
            },  
            "open": {  
                "type": "long"  
            },  
            "symbol": {  
                "index": "not_analyzed",  
                "type": "string"  
            },  
            "volume": {  
                "type": "long"  
            },  
            "high": {  
                "type": "long"  
            },  
            "low": {  
                "type": "long"  
            },  
            "date": {  
                "format": "strict_date_optional_time||epoch_millis",  
                "type": "date"  
            },  
            "close": {  
                "type": "long"  
            }  
        },  
        "_all": {  
            "enabled": false  
        }  
    }  
}

 


索引中的所有數據:app

name age salary team position
james 33 3000 cav sf
irving 25 2000 cav pg
curry 29 1000 war pg
thompson 26 2000 war sg
green 26 2000 war pf
garnett 40 1000 tim pf
towns 21 500 tim c
lavin 21 300 tim sg
wigins 20 500 tim sf

首先,初始化Builder:函數

SearchRequestBuilder sbuilder = client.prepareSearch("player").setTypes("player");  ui

接下來舉例說明各類聚合操做的實現方法,由於在es的api中,多字段上的聚合操做須要用到子聚合(subAggregation),初學者可能找不到方法(網上資料比較少,筆者在這個問題上折騰了兩天,最後度了源碼才完全搞清楚T_T),後邊會特地說明多字段聚合的實現方法。另外,聚合後的排序也會單獨說明。spa

1. group by/count

例如要計算每一個球隊的球員數,若是使用SQL語句,應表達以下:code

select team, count(*) as player_count from player group by team;  orm

ES的java api:排序

TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");  
sbuilder.addAggregation(teamAgg);  
SearchResponse response = sbuilder.execute().actionGet();

2.group by多個field

例如要計算每一個球隊每一個位置的球員數,若是使用SQL語句,應表達以下:索引

select team, position, count(*) as pos_count from player group by team, position;  

ES的java api:

TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");  
TermsBuilder posAgg= AggregationBuilders.terms("pos_count").field("position");  
sbuilder.addAggregation(teamAgg.subAggregation(posAgg));  
SearchResponse response = sbuilder.execute().actionGet();

3.max/min/sum/avg

例如要計算每一個球隊年齡最大/最小/總/平均的球員年齡,若是使用SQL語句,應表達以下:

select team, max(age) as max_age from player group by team;  

ES的java api:

TermsBuilder teamAgg= AggregationBuilders.terms("player_count ").field("team");  
MaxBuilder ageAgg= AggregationBuilders.max("max_age").field("age");  
sbuilder.addAggregation(teamAgg.subAggregation(ageAgg));  
SearchResponse response = sbuilder.execute().actionGet();

4.對多個field求max/min/sum/avg

例如要計算每一個球隊球員的平均年齡,同時又要計算總年薪,若是使用SQL語句,應表達以下:

select team, avg(age)as avg_age, sum(salary) as total_salary from player group by team;  

ES的java api:

TermsBuilder teamAgg= AggregationBuilders.terms("team");  
AvgBuilder ageAgg= AggregationBuilders.avg("avg_age").field("age");  
SumBuilder salaryAgg= AggregationBuilders.avg("total_salary ").field("salary");  
sbuilder.addAggregation(teamAgg.subAggregation(ageAgg).subAggregation(salaryAgg));  
SearchResponse response = sbuilder.execute().actionGet();

5.聚合後對Aggregation結果排序

例如要計算每一個球隊總年薪,並按照總年薪倒序排列,若是使用SQL語句,應表達以下:

select team, sum(salary) as total_salary from player group by team order by total_salary desc;  

ES的java api:

TermsBuilder teamAgg= AggregationBuilders.terms("team").order(Order.aggregation("total_salary ", false);  
SumBuilder salaryAgg= AggregationBuilders.avg("total_salary ").field("salary");  
sbuilder.addAggregation(teamAgg.subAggregation(salaryAgg));  
SearchResponse response = sbuilder.execute().actionGet();


須要特別注意的是,排序是在TermAggregation處執行的,Order.aggregation函數的第一個參數是aggregation的名字,第二個參數是boolean型,true表示正序,false表示倒序。

6.Aggregation結果條數的問題

默認狀況下,search執行後,僅返回10條聚合結果,若是想反悔更多的結果,須要在構建TermsBuilder 時指定size:

TermsBuilder teamAgg= AggregationBuilders.terms("team").size(15);  

7.Aggregation結果的解析/輸出

獲得response後:

Map<String, Aggregation> aggMap = response.getAggregations().asMap();  
        StringTerms teamAgg= (StringTerms) aggMap.get("keywordAgg");  
        Iterator<Bucket> teamBucketIt = teamAgg.getBuckets().iterator();  
        while (teamBucketIt .hasNext()) {  
            Bucket buck = teamBucketIt .next();  
            //球隊名  
            String team = buck.getKey();  
            //記錄數  
            long count = buck.getDocCount();  
            //獲得全部子聚合  
            Map subaggmap = buck.getAggregations().asMap();  
            //avg值獲取方法  
            double avg_age= ((InternalAvg) subaggmap.get("avg_age")).getValue();  
            //sum值獲取方法  
            double total_salary = ((InternalSum) subaggmap.get("total_salary")).getValue();  
            //...  
            //max/min以此類推  
        }

 

8. 總結

綜上,聚合操做主要是調用了SearchRequestBuilder的addAggregation方法,一般是傳入一個TermsBuilder,子聚合調用TermsBuilder的subAggregation方法,能夠添加的子聚合有TermsBuilder、SumBuilder、AvgBuilder、MaxBuilder、MinBuilder等常見的聚合操做。

 

從實現上來說,SearchRequestBuilder在內部保持了一個私有的 SearchSourceBuilder實例, SearchSourceBuilder內部包含一個List<AbstractAggregationBuilder>,每次調用addAggregation時會調用 SearchSourceBuilder實例,添加一個AggregationBuilder。

一樣的,TermsBuilder也在內部保持了一個List<AbstractAggregationBuilder>,調用addAggregation方法(來自父類addAggregation)時會添加一個AggregationBuilder。有興趣的讀者也能夠閱讀源碼的實現。

 

若是有任何問題,歡迎一塊兒討論,若是文中有什麼錯誤,歡迎批評指正。

相關文章
相關標籤/搜索