Hive窗口函數

@mysql

官方文檔地址

Hive官網,點我就進
oracle,sqlserver都提供了窗口函數,可是在mysql5.5和5.6都沒有提供窗口函數!算法

窗口函數: 窗口+函數sql

  • 窗口: 函數運行時計算的數據集的範圍
  • 函數: 運行的函數!
    僅僅支持如下函數:

Windowing functions

  • LEAD (scalar_expression [,offset] [,default]): 返回當前行如下N行的指定列的列值!若是找不到,就採用默認值
  • LAG (scalar_expression [,offset] [,default]): 返回當前行以上N行的指定列的列值!若是找不到,就採用默認值
  • FIRST_VALUE(列名,[false(默認)]):返回當前窗口指定列的第一個值,第二個參數若是爲true,表明加入第一個值爲null,跳過空值,繼續尋找!
  • LAST_VALUE(列名,[false(默認)]):返回當前窗口指定列的最後一個值,第二個參數若是爲true,表明加入第一個值爲null,跳過空值,繼續尋找!

統計類的函數(通常都須要結合over使用):min,max,avg,sum,count

排名分析:express

  • RANK
  • ROW_NUMBER
  • DENSE_RANK
  • CUME_DIST
  • PERCENT_RANK
  • NTILE

注意:不是全部的函數在運行都是能夠經過改變窗口的大小,來控制計算的數據集的範圍!全部的排名函數和LAG,LEAD,支持使用over(),可是在over()中不能定義 window_clauseapache

格式: 函數 over( partition by 字段 ,order by 字段 window_clause )windows

窗口的大小能夠經過windows_clause來指定:

(rows | range) between (unbounded | [num]) preceding and ([num] preceding | current row | (unbounded | [num]) following)
(rows | range) between current row and (current row | (unbounded | [num]) following)
(rows | range) between [num] following and (unbounded | [num]) following

特殊狀況:

  • ①在over()中既沒有出現windows_clause,也沒有出現order by,窗口默認爲rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING
  • ②在over()中(沒有出現windows_clause),指定了order by,窗口默認爲rows between UNBOUNDED PRECEDING and CURRENT ROW

窗口函數和分組有什麼區別?

  • ①若是是分組操做,select後只能寫分組後的字段
  • ②若是是窗口函數,窗口函數是在指定的窗口內,對每條記錄都執行一次函數
  • ③若是是分組操做,有去重效果,而partition不去重!

練習

(9) 查詢前20%時間的訂單信息
精確算法:oracle

select *
 from
 (select name,orderdate,cost,cume_dist() over(order by orderdate ) cdnum
 from  business) tmp
 where cdnum<=0.2

不精確計算:函數

select *
 from
 (select name,orderdate,cost,ntile(5) over(order by orderdate ) cdnum
 from  business) tmp
 where cdnum=1

(8)查詢顧客的購買明細及顧客最近三次cost花費sqlserver

最近三次: 當前和以前兩次當前+前一次+後一次scala

當前和以前兩次:

select name,orderdate,cost,sum(cost) over(partition by name order by orderdate rows between 2 PRECEDING and CURRENT  row) 
 from business

當前+前一次+後一次:

select name,orderdate,cost,sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and 1  FOLLOWING) 
 from business

select name,orderdate,cost,cost+
 lag(cost,1,0) over(partition by name order by orderdate )+
 lead(cost,1,0) over(partition by name order by orderdate )
 from business

(7) 查詢顧客的購買明細及顧客本月最後一次購買的時間

select name,orderdate,cost,LAST_VALUE(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate rows between CURRENT  row and UNBOUNDED  FOLLOWING) 
 from business

(6) 查詢顧客的購買明細及顧客本月第一次購買的時間

select name,orderdate,cost,FIRST_VALUE(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate ) 
 from business

(5) 查詢顧客的購買明細及顧客下次的購買時間

select name,orderdate,cost,lead(orderdate,1,'無數據') over(partition by name order by orderdate ) 
 from business

(4)查詢顧客的購買明細及顧客上次的購買時間

select name,orderdate,cost,lag(orderdate,1,'無數據') over(partition by name order by orderdate ) 
 from business

(3)查詢顧客的購買明細要將cost按照日期進行累加

select name,orderdate,cost,sum(cost) over(partition by name order by orderdate ) 
 from business

(2)查詢顧客的購買明細及月購買總額

select name,orderdate,cost,sum(cost) over(partition by name,substring(orderdate,1,7) ) 
 from business

(1)查詢在2017年4月份購買過的顧客及總人數

select name,count(*) over(rows between UNBOUNDED  PRECEDING and UNBOUNDED  FOLLOWING)
from business
where substring(orderdate,1,7)='2017-04'
group by name

等價於

select name,count(*) over()
from business
where substring(orderdate,1,7)='2017-04'
group by name
相關文章
相關標籤/搜索