Function (arg1,..., argn) OVER ([PARTITION BY <...>] [ORDER BY <....>] [<window_expression>]) Function (arg1,..., argn) 能夠是下面的函數: Aggregate Functions: 聚合函數,好比:sum(...)、 max(...)、min(...)、avg(...)等. Sort Functions: 數據排序函數, 好比 :rank(...)、row_number(...)等. Analytics Functions: 統計和比較函數, 好比:lead(...)、lag(...)、 first_value(...)等.
CREATE TABLE IF NOT EXISTS temp.test ( `name` string COMMENT '姓名', `dept_num` int COMMENT '編號', `employee_id` int COMMENT 'id', `salary` int COMMENT '工資', `type` string COMMENT '崗位類型', `start_date` date COMMENT '入職時間' ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED as TEXTFILE;
name dept_num employee_id salary type start_date Michael 1000 100 5000 full 2014-01-29 Will 1000 101 4000 full 2013-10-02 Wendy 1000 101 4000 part 2014-10-02 Steven 1000 102 6400 part 2012-11-03 Lucy 1000 103 5500 full 2010-01-03 Lily 1001 104 5000 part 2014-11-29 Jess 1001 105 6000 part 2014-12-02 Mike 1001 106 6400 part 2013-11-03 Wei 1002 107 7000 part 2010-04-03 Yun 1002 108 5500 full 2014-01-29 Richard 1002 109 8000 full 2013-09-01
load data local inpath '/root/test' into table temp.test;
select `name`,`dept_num`,`salary`, count(*) over (partition by dept_num) as cnt from employee;
3.1.一、輸出結果爲:sql
Total MapReduce CPU Time Spent: 1 seconds 780 msec OK name dept_num salary cnt Lucy 1000 5500 5 Steven 1000 6400 5 Wendy 1000 4000 5 Will 1000 4000 5 Michael 1000 5000 5 Mike 1001 6400 3 Jess 1001 6000 3 Lily 1001 5000 3 Richard 1002 8000 3 Yun 1002 5500 3 Wei 1002 7000 3 Time taken: 22.624 seconds, Fetched: 12 row(s)
select name,dept_num,salary, sum(salary) over (partition by dept_num order by dept_num) as sum_dept_salary from employee order by sum_dept_salary desc;
3.2.一、輸出結果爲:express
Total MapReduce CPU Time Spent: 3 seconds 470 msec OK name dept_num salary sum_dept_salary Michael 1000 5000 24900 Will 1000 4000 24900 Wendy 1000 4000 24900 Steven 1000 6400 24900 Lucy 1000 5500 24900 Wei 1002 7000 20500 Yun 1002 5500 20500 Richard 1002 8000 20500 Lily 1001 5000 17400 Jess 1001 6000 17400 Mike 1001 6400 17400 Time taken: 47.313 seconds, Fetched: 12 row(s)
select `name`,`dept_num`,`salary`, row_number() over (order by salary desc ) rnum from employee;
4.1.一、輸出結果爲:ide
Total MapReduce CPU Time Spent: 1 seconds 890 msec OK name dept_num salary rnum Richard 1002 8000 1 Wei 1002 7000 2 Mike 1001 6400 3 Steven 1000 6400 4 Jess 1001 6000 5 Yun 1002 5500 6 Lucy 1000 5500 7 Lily 1001 5000 8 Michael 1000 5000 9 Wendy 1000 4000 10 Will 1000 4000 11 Time taken: 22.453 seconds, Fetched: 12 row(s)
select name,dept_num,salary from ( select `name`,`dept_num`,`salary`, row_number() over (partition by dept_num order by salary desc ) rnum from employee) t1 where rnum <= 2;
4.2.一、輸出結果爲:函數
Total MapReduce CPU Time Spent: 2 seconds 680 msec OK name dept_num salary Steven 1000 6400 Lucy 1000 5500 Mike 1001 6400 Jess 1001 6000 Richard 1002 8000 Wei 1002 7000 Time taken: 24.083 seconds, Fetched: 7 row(s)
select `name`,`dept_num`,`salary`, row_number() over (partition by dept_num order by salary desc ) rnum from employee;
4.3.一、輸出結果爲:code
Total MapReduce CPU Time Spent: 1 seconds 860 msec OK name dept_num salary rnum Steven 1000 6400 1 Lucy 1000 5500 2 Michael 1000 5000 3 Wendy 1000 4000 4 Will 1000 4000 5 Mike 1001 6400 1 Jess 1001 6000 2 Lily 1001 5000 3 Richard 1002 8000 1 Wei 1002 7000 2 Yun 1002 5500 3 Time taken: 23.202 seconds, Fetched: 12 row(s)
select `name`,`dept_num`,`salary`, rank() over (order by salary desc) rank from employee;
4.4.一、輸出結果爲:排序
Total MapReduce CPU Time Spent: 1 seconds 830 msec OK name dept_num salary rank Richard 1002 8000 1 Wei 1002 7000 2 Mike 1001 6400 3 Steven 1000 6400 3 Jess 1001 6000 5 Yun 1002 5500 6 Lucy 1000 5500 6 Lily 1001 5000 8 Michael 1000 5000 8 Wendy 1000 4000 10 Will 1000 4000 10 Time taken: 21.547 seconds, Fetched: 12 row(s)
select `name`,`dept_num`,`salary`, dense_rank() over (order by salary desc) rank from employee;
4.5.一、輸出結果爲:string
Total MapReduce CPU Time Spent: 1 seconds 710 msec OK name dept_num salary rank Richard 1002 8000 1 Wei 1002 7000 2 Mike 1001 6400 3 Steven 1000 6400 3 Jess 1001 6000 4 Yun 1002 5500 5 Lucy 1000 5500 5 Lily 1001 5000 6 Michael 1000 5000 6 Wendy 1000 4000 7 Will 1000 4000 7 Time taken: 21.879 seconds, Fetched: 12 row(s)
select name,dept_num,salary, percent_rank() over (order by salary desc) rank from employee;
4.6.一、輸出結果爲:it
Total MapReduce CPU Time Spent: 1 seconds 940 msec OK name dept_num salary rank Richard 1002 8000 0.0 Wei 1002 7000 0.09090909090909091 Mike 1001 6400 0.18181818181818182 Steven 1000 6400 0.18181818181818182 Jess 1001 6000 0.36363636363636365 Yun 1002 5500 0.45454545454545453 Lucy 1000 5500 0.45454545454545453 Lily 1001 5000 0.6363636363636364 Michael 1000 5000 0.6363636363636364 Wendy 1000 4000 0.8181818181818182 Will 1000 4000 0.8181818181818182 Time taken: 22.401 seconds, Fetched: 12 row(s)
SELECT name,dept_num as deptno,salary, ntile(4) OVER(ORDER BY salary desc) as ntile FROM employee;
4.7.一、輸出結果爲:io
Total MapReduce CPU Time Spent: 1 seconds 940 msec OK name deptno salary ntile Richard 1002 8000 1 Wei 1002 7000 1 Mike 1001 6400 1 Steven 1000 6400 2 Jess 1001 6000 2 Yun 1002 5500 2 Lucy 1000 5500 3 Lily 1001 5000 3 Michael 1000 5000 3 Wendy 1000 4000 4 Will 1000 4000 4 Time taken: 28.829 seconds, Fetched: 12 row(s)
SELECT name,dept_num,salary, cume_dist() OVER (ORDER BY salary) as cume FROM employee;
5.1.一、輸出結果爲:table
name deptno salary cume Wendy 1000 4000 0.25 Will 1000 4000 0.25 Lily 1001 5000 0.4166666666666667 Michael 1000 5000 0.4166666666666667 Yun 1002 5500 0.5833333333333334 Lucy 1000 5500 0.5833333333333334 Jess 1001 6000 0.6666666666666666 Mike 1001 6400 0.8333333333333334 Steven 1000 6400 0.8333333333333334 Wei 1002 7000 0.9166666666666666 Richard 1002 8000 1.0 Time taken: 20.869 seconds, Fetched: 12 row(s)
SELECT name,dept_num,salary, cume_dist() OVER (ORDER BY salary desc) as cume FROM employee;
5.2.一、輸出結果爲:
Total MapReduce CPU Time Spent: 1 seconds 790 msec OK name dept_num salary cume Richard 1002 8000 0.08333333333333333 Wei 1002 7000 0.16666666666666666 Mike 1001 6400 0.3333333333333333 Steven 1000 6400 0.3333333333333333 Jess 1001 6000 0.4166666666666667 Yun 1002 5500 0.5833333333333334 Lucy 1000 5500 0.5833333333333334 Lily 1001 5000 0.75 Michael 1000 5000 0.75 Wendy 1000 4000 0.9166666666666666 Will 1000 4000 0.9166666666666666 Time taken: 21.672 seconds, Fetched: 12 row(s)
SELECT name,dept_num,salary, cume_dist() OVER (PARTITION BY dept_num ORDER BY salary) as cume FROM employee;
5.3.一、輸出結果爲:
Total MapReduce CPU Time Spent: 2 seconds 130 msec OK name dept_num salary cume Wendy 1000 4000 0.4 Will 1000 4000 0.4 Michael 1000 5000 0.6 Lucy 1000 5500 0.8 Steven 1000 6400 1.0 Lily 1001 5000 0.3333333333333333 Jess 1001 6000 0.6666666666666666 Mike 1001 6400 1.0 Yun 1002 5500 0.3333333333333333 Wei 1002 7000 0.6666666666666666 Richard 1002 8000 1.0 Time taken: 22.055 seconds, Fetched: 12 row(s)
SELECT name,dept_num,salary, lead(salary,1) OVER (PARTITION BY dept_num ORDER BY salary) as lead FROM employee;
5.4.一、輸出結果爲:
Total MapReduce CPU Time Spent: 1 seconds 880 msec OK name dept_num salary lead Wendy 1000 4000 4000 Will 1000 4000 5000 Michael 1000 5000 5500 Lucy 1000 5500 6400 Steven 1000 6400 NULL Lily 1001 5000 6000 Jess 1001 6000 6400 Mike 1001 6400 NULL Yun 1002 5500 7000 Wei 1002 7000 8000 Richard 1002 8000 NULL Time taken: 21.57 seconds, Fetched: 12 row(s)
SELECT name,dept_num,salary, lag(salary,1) OVER (PARTITION BY dept_num ORDER BY salary) as lead FROM employee;
5.5.一、輸出結果爲:
Total MapReduce CPU Time Spent: 1 seconds 700 msec OK name dept_num salary lead Wendy 1000 4000 NULL Will 1000 4000 4000 Michael 1000 5000 4000 Lucy 1000 5500 5000 Steven 1000 6400 5500 Lily 1001 5000 NULL Jess 1001 6000 5000 Mike 1001 6400 6000 Yun 1002 5500 NULL Wei 1002 7000 5500 Richard 1002 8000 7000 Time taken: 21.423 seconds, Fetched: 12 row(s)
SELECT name,dept_num,salary, first_value(salary) OVER (PARTITION BY dept_num ORDER BY salary) as fval FROM employee;
5.6.一、輸出結果爲:
Total MapReduce CPU Time Spent: 1 seconds 720 msec OK name dept_num salary fval Wendy 1000 4000 4000 Will 1000 4000 4000 Michael 1000 5000 4000 Lucy 1000 5500 4000 Steven 1000 6400 4000 Lily 1001 5000 5000 Jess 1001 6000 5000 Mike 1001 6400 5000 Yun 1002 5500 5500 Wei 1002 7000 5500 Richard 1002 8000 5500 Time taken: 20.379 seconds, Fetched: 12 row(s)
SELECT name,dept_num,salary, last_value(salary) OVER (PARTITION BY dept_num ORDER BY salary RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lval FROM employee;
5.7.一、輸出結果爲:
Total MapReduce CPU Time Spent: 1 seconds 770 msec OK name dept_num salary lval Wendy 1000 4000 6400 Will 1000 4000 6400 Michael 1000 5000 6400 Lucy 1000 5500 6400 Steven 1000 6400 6400 Lily 1001 5000 6400 Jess 1001 6000 6400 Mike 1001 6400 6400 Yun 1002 5500 8000 Wei 1002 7000 8000 Richard 1002 8000 8000 Time taken: 21.649 seconds, Fetched: 12 row(s)