Hive_Rank

時間 2019-11-08

標籤 hive rank 欄目 Hadoop 简体版

原文原文鏈接

1．函數說明函數

RANK()：oop

　　排序相同時會重複，總數不會變spa

DENSE_RANK()：code

　　排序相同時會重複，總數會減小orm

ROW_NUMBER()：blog

　　會根據順序計算排序

2．數據準備hadoop

nameci	subject數學	score
孫悟空	語文	87
孫悟空	數學	95
孫悟空	英語	68
大海	語文	94
大海	數學	56
大海	英語	84
宋宋	語文	64
宋宋	數學	86
宋宋	英語	84
婷婷	語文	65
婷婷	數學	85
婷婷	英語	78

3．需求

計算每門學科成績排名。

4．建立本地score.txt，導入數據

[hadoop@hadoop102 datas]$ vi score.txt

5．建立hive表並導入數據

create table score(
name string,
subject string, 
score int) 
row format delimited fields terminated by "\t";
load data local inpath '/opt/module/datas/score.txt' into table score;

6．按需求查詢數據

select name,
subject,
score,
rank() over(partition by subject order by score desc) rp,
dense_rank() over(partition by subject order by score desc) drp,
row_number() over(partition by subject order by score desc) rmp
from score;

name    subject score   rp      drp     rmp
孫悟空  數學    95      1       1       1
宋宋    數學    86      2       2       2
婷婷    數學    85      3       3       3
大海    數學    56      4       4       4
宋宋    英語    84      1       1       1
大海    英語    84      1       1       2
婷婷    英語    78      3       2       3
孫悟空  英語    68      4       3       4
大海    語文    94      1       1       1
孫悟空  語文    87      2       2       2
婷婷    語文    65      3       3       3
宋宋    語文    64      4       4       4

擴展：求出每門學科前三名的學生？

select name,

subject,

score,

rank() over(partition by subject order by score desc) rp,

dense_rank() over(partition by subject order by score desc) drp,

row_number() over(partition by subject order by score desc) rmp

from score;

name subject score rp drp rmp

孫悟空數學 95 1 1 1

宋宋數學 86 2 2 2

婷婷數學 85 3 3 3

大海數學 56 4 4 4

宋宋英語 84 1 1 1

大海英語 84 1 1 2

婷婷英語 78 3 2 3

孫悟空英語 68 4 3 4

大海語文 94 1 1 1

孫悟空語文 87 2 2 2

婷婷語文 65 3 3 3

宋宋語文 64 4 4 4

更多相關文章...

相關標籤/搜索

Hadoop

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。