pandas的qcut能夠把一組數字按大小區間進行分區,好比spa
data = pd.Series([0,8,1,5,3,7,2,6,10,4,9])
好比我要把這組數據分紅兩部分,一半大的,一半小的,若是是小的數,值就變成'small number',大的數,值就變成'large number':code
print(pd.qcut(data,[0,0.5,1],labels=['small number','large number']))
0 small numbers 1 large numbers 2 small numbers 3 small numbers 4 small numbers 5 large numbers 6 small numbers 7 large numbers 8 large numbers 9 small numbers 10 large numbers dtype: category Categories (2, object): [small numbers < large numbers]
qcut() 方法第一個參數是數據,第二個參數定義區間的分割方法,好比這裏把數字分紅兩半,那就是 [0, 0.5, 1] 若是要分紅4份,就是 [0, 0.25, 0.5, 0.75, 1] ,也能夠不是均分,好比 [0, 0.1, 0.2, 0.3, 1] ,這就就會按照 1:1:1:7 進行分佈,好比:blog
data = pd.Series([0,8,1,5,3,7,2,6,10,4,9]) print(pd.qcut(data,[0, 0.1, 0.2, 0.3, 1],labels=['first 10%','second 10%','third 10%','70%']))
0 first 10% 1 70% 2 first 10% 3 70% 4 third 10% 5 70% 6 second 10% 7 70% 8 70% 9 70% 10 70%
dtype: category Categories
(4, object): [first 10% < second 10% < third 10% < 70%]
固然,這裏由於數據裏有11個數,無法恰好按照 1:1:1:7 分,因此 0和1,都被分到了 'first10%' 這一類.pandas
qcut() 方法第二個參數是要替換的值,就是對應區間的值應該替換成什麼值,順序和區間保持一致就行了,注意有幾個區間,就要給幾個值,不能多也不能少.class