Python apply函數

時間 2019-11-06

原文原文鏈接

Python apply函數

一、介紹

apply函數是pandas裏面全部函數中自由度最高的函數。該函數以下：數據結構

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)app

該函數最有用的是第一個參數，這個參數是函數，至關於C/C++的函數指針。dom

這個函數須要本身實現，函數的傳入參數根據axis來定，好比axis = 1，就會把一行數據做爲Series的數據結構傳入給本身實現的函數中，咱們在函數中實現對Series不一樣屬性之間的計算，返回一個結果，則apply函數會自動遍歷每一行DataFrame的數據，最後將全部結果組合成一個Series數據結構並返回。函數

二、樣例

import numpy as np
import pandas as pd


f = lambda x: x.max()-x.min()

df = pd.DataFrame(np.random.randn(4,3),columns=list('bde'),index=['utah', 'ohio', 'texas', 'oregon'])
print(df)

t1 = df.apply(f)
print(t1)

t2 = df.apply(f, axis=1)
print(t2)

輸出結果以下所示：性能

               b         d         e
utah    1.106486  0.101113 -0.494279
ohio    0.955676 -1.889499  0.522151
texas   1.891144 -0.670588  0.106530
oregon -0.062372  0.991231  0.294464

b    1.953516
d    2.880730
e    1.016430
dtype: float64

utah      1.600766
ohio      2.845175
texas     2.561732
oregon    1.053603
dtype: float64

三、性能比較

df = pd.DataFrame({'a': np.random.randn(6),
                   'b': ['foo', 'bar'] * 3,
                   'c': np.random.randn(6)})


def my_test(a, b):
    return a + b


print(df)


df['Value'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1) # 方法1
print(df)

df['Value2'] = df['a'] + df['c']  # 方法2
print(df)

輸出結果以下：spa

          a    b         c
0 -1.194841  foo  1.648214
1 -0.377554  bar  0.496678
2  1.524940  foo -1.245333
3 -0.248150  bar  1.526515
4  0.283395  foo  1.282233
5  0.117674  bar -0.094462

          a    b         c     Value
0 -1.194841  foo  1.648214  0.453374
1 -0.377554  bar  0.496678  0.119124
2  1.524940  foo -1.245333  0.279607
3 -0.248150  bar  1.526515  1.278365
4  0.283395  foo  1.282233  1.565628
5  0.117674  bar -0.094462  0.023212

          a    b         c     Value    Value2
0 -1.194841  foo  1.648214  0.453374  0.453374
1 -0.377554  bar  0.496678  0.119124  0.119124
2  1.524940  foo -1.245333  0.279607  0.279607
3 -0.248150  bar  1.526515  1.278365  1.278365
4  0.283395  foo  1.282233  1.565628  1.565628
5  0.117674  bar -0.094462  0.023212  0.023212

注意：當數據量很大時，對於簡單的邏輯處理建議方法2（我的處理幾百M數據集時，方法1花時200s左右，方法2花時10s）！！！指針

一、介紹code