數據準備
數據格式
cookie3.txtcookie
d1,user1,1000 d1,user2,2000 d1,user3,3000 d2,user4,4000 d2,user5,5000
建立表
use cookie; drop table if exists cookie3; create table cookie3(dept string, userid string, sal int) row format delimited fields terminated by ','; load data local inpath "/home/hadoop/cookie3.txt" into table cookie3; select * from cookie3;
玩一玩CUME_DIST
說明
–CUME_DIST :小於等於當前值的行數/分組內總行數oop
查詢語句
好比,統計小於等於當前薪水的人數,所佔總人數的比例post
select dept, userid, sal, cume_dist() over (order by sal) as rn1, cume_dist() over (partition by dept order by sal) as rn2 from cookie.cookie3;
查詢結果
結果說明
rn1: 沒有partition,全部數據均爲1組,總行數爲5, 第一行:小於等於1000的行數爲1,所以,1/5=0.2 第三行:小於等於3000的行數爲3,所以,3/5=0.6 rn2: 按照部門分組,dpet=d1的行數爲3, 第二行:小於等於2000的行數爲2,所以,2/3=0.6666666666666666
玩一玩PERCENT_RANK
說明
–PERCENT_RANK :分組內當前行的RANK值-1/分組內總行數-1code
查詢語句
select dept, userid, sal, percent_rank() over (order by sal) as rn1, --分組內 rank() over (order by sal) as rn11, --分組內的rank值 sum(1) over (partition by null) as rn12, --分組內總行數 percent_rank() over (partition by dept order by sal) as rn2, rank() over (partition by dept order by sal) as rn21, sum(1) over (partition by dept) as rn22 from cookie.cookie3;
查詢結果
結果說明
–PERCENT_RANK :分組內當前行的RANK值-1/分組內總行數-1orm
rn1 == (rn11-1) / (rn12-1)blog
rn2 == (rn21-1) / (rn22-1)hadoop
rn1: rn1 = (rn11-1) / (rn12-1) 第一行,(1-1)/(5-1)=0/4=0 第二行,(2-1)/(5-1)=1/4=0.25 第四行,(4-1)/(5-1)=3/4=0.75 rn2: 按照dept分組, dept=d1的總行數爲3 第一行,(1-1)/(3-1)=0 第三行,(3-1)/(3-1)=1