(十一)Hive分析窗口函數(三) CUME_DIST和PERCENT_RANK

數據準備

 

數據格式

cookie3.txtcookie

d1,user1,1000
d1,user2,2000
d1,user3,3000
d2,user4,4000
d2,user5,5000

 

建立表

use cookie;
drop table if exists cookie3;
create table cookie3(dept string, userid string, sal int) 
row format delimited fields terminated by ',';
load data local inpath "/home/hadoop/cookie3.txt" into table cookie3;
select * from cookie3;

 

玩一玩CUME_DIST

 

說明

CUME_DIST :小於等於當前值的行數/分組內總行數oop

 

查詢語句

好比,統計小於等於當前薪水的人數,所佔總人數的比例post

select 
  dept,
  userid,
  sal,
  cume_dist() over (order by sal) as rn1,
  cume_dist() over (partition by dept order by sal) as rn2
from cookie.cookie3;

 

查詢結果 

 

 

結果說明

rn1: 沒有partition,全部數據均爲1組,總行數爲5,
     第一行:小於等於1000的行數爲1,所以,1/5=0.2
     第三行:小於等於3000的行數爲3,所以,3/5=0.6
rn2: 按照部門分組,dpet=d1的行數爲3,
     第二行:小於等於2000的行數爲2,所以,2/3=0.6666666666666666

 

玩一玩PERCENT_RANK

 

說明

 –PERCENT_RANK :分組內當前行的RANK值-1/分組內總行數-1code

 

查詢語句

select 
  dept,
  userid,
  sal,
  percent_rank() over (order by sal) as rn1, --分組內
  rank() over (order by sal) as rn11, --分組內的rank值
  sum(1) over (partition by null) as rn12, --分組內總行數
  percent_rank() over (partition by dept order by sal) as rn2,
  rank() over (partition by dept order by sal) as rn21,
  sum(1) over (partition by dept) as rn22 
from cookie.cookie3;
 

 

查詢結果

 

結果說明

–PERCENT_RANK :分組內當前行的RANK值-1/分組內總行數-1orm

rn1 ==  (rn11-1) / (rn12-1)blog

rn2 ==  (rn21-1) / (rn22-1)hadoop

rn1: rn1 = (rn11-1) / (rn12-1) 
       第一行,(1-1)/(5-1)=0/4=0
       第二行,(2-1)/(5-1)=1/4=0.25
       第四行,(4-1)/(5-1)=3/4=0.75
rn2: 按照dept分組,
     dept=d1的總行數爲3
     第一行,(1-1)/(3-1)=0
     第三行,(3-1)/(3-1)=1
相關文章
相關標籤/搜索