hive row_number等窗口分析函數

 

1、排序&去重分析spa

row_number() over(partititon by col1 order by col2) as rn3d

結果:1,2,3,4code

rank() over(partititon by col1 order by col2) as rkblog

結果:1,2,2,4,5排序

dense_rank() over(partititon by col1 order by col2) as ds_rkit

結果:1,2,2,3,4io

select 
        order_id,
        departure_date,
        row_number() over(partition by order_id order by departure_date) as rn,  -- 直排
        rank() over(partition by order_id order by departure_date) as rk,        -- 並列的,下一個數字會跳過
        dense_rank() over(partition by order_id order by departure_date) as d_rk -- 並列的,下一個數據不會跳過
  from ord_test 
 where order_id=410341346
;

運行結果:ast

 

2、跨行獲取  class

lag(col1,n,DEFAULT) over(partition by col1 order by col2) as up
用於統計窗口內往上第n行值,第一個參數爲列名,第二個參數爲往上第n行(可選,默認爲1),第三個參數爲默認值(當往上第n行爲NULL時候,取默認值,如不指定,則爲NULL)test

lead(col1,n,DEFAULT) over(partition by col1 order by col2) as down
用於統計窗口內往下第n行值,第一個參數爲列名,第二個參數爲往下第n行(可選,默認爲1),第三個參數爲默認值(當往下第n行爲NULL時候,取默認值,如不指定,則爲NULL)

first_value() over(partition by col1 order by col2) as fv
取分組內排序後,截止到當前行,第一個值

last_value() over(partition by col1 order by col2) as lv
取分組內排序後,截止到當前行,第一個值

select 
       order_id,
       departure_date,
       first_value(departure_date) over(partition by order_id order by add_time)as fv,  -- 取分組內第一條
       last_value(departure_date) over(partition by order_id order by add_time)as lv    -- 取分組內最後一條  
  from ord_test
 where order_id=410341346
;

select 
       order_id,
       departure_date,
       lead(departure_date,1) over(partition by order_id order by departure_date)as down_1, -- 向下取一級
       lag(departure_date,1) over(partition by order_id order by departure_date)as up_1     -- 向上取一級
  from ord_test
 where order_id=410341346
;

結果截圖:

相關文章
相關標籤/搜索