10分鐘瞭解Pandas基礎知識

背景

在數據分析中pandas舉足輕重,學習pandas最好的方法就是看官方文檔,如下是根據官方文檔10 Minutes to pandas學習記錄。(官方標題10分鐘,感受起碼得半個小時吧)html

pandas中主要有兩種數據類型,能夠簡單的理解爲:python

  • Series:一維數組
  • DateFrame:二維數組(矩陣)

有了大概的概念以後,開始正式認識pandas:git

首先要引入對應的包:sql

import numpy as np
import pandas as pd

新建對象 Object Creation

  • Seriesapi

    能夠經過傳入一個list對象來新建Series,其中空值爲np.nan:數組

    s = pd.Series([1,3,4,np.nan,7,9])
    s
    Out[5]: 
    0    1.0
    1    3.0
    2    4.0
    3    NaN
    4    7.0
    5    9.0
    dtype: float64

    pandas會默認建立一列索引index(上面的0-5)。咱們也能夠在建立時就指定索引:數據結構

    pd.Series([1,3,4,np.nan,7,9], index=[1,1,2,2,'a',4])
    Out[9]: 
    1    1.0
    1    3.0
    2    4.0
    2    NaN
    a    7.0
    4    9.0
    dtype: float64

    要注意的是,索引是能夠重複的,也能夠是字符。app

  • DataFramedom

    新建一個DataFrame對象能夠有多種方式:ide

    • 經過傳入一個numpy的數組、指定一個時間的索引以及一個列名。

      dates = pd.date_range('20190101', periods=6)
      dates
      Out[11]: 
      DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
                     '2019-01-05', '2019-01-06'],
                    dtype='datetime64[ns]', freq='D')
      df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
      df
      Out[18]: 
                         A         B         C         D
      2019-01-01  0.671622  0.785726  0.392435  0.874692
      2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
      2019-01-03  1.364425 -0.947641  2.386880  0.585372
      2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
      2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
      2019-01-06  0.221597 -0.753038 -1.741256  0.287280
    • 經過傳入一個dict對象

      df2 = pd.DataFrame({'A':1.,
                          'B':pd.Timestamp('20190101'),
                          'C':pd.Series(1, index=list(range(4)), dtype='float32'),
                          'D':np.array([3]*4, dtype='int32'),
                          'E':pd.Categorical(["test", "tain", "test", "train"]),
                          'F':'foo'})
      df2
      Out[27]: 
           A          B    C  D      E    F
      0  1.0 2019-01-01  1.0  3   test  foo
      1  1.0 2019-01-01  1.0  3   tain  foo
      2  1.0 2019-01-01  1.0  3   test  foo
      3  1.0 2019-01-01  1.0  3  train  foo

      這裏咱們指定了不一樣的類型,能夠經過以下查看:

      df2.dtypes
      Out[28]: 
      A           float64
      B    datetime64[ns]
      C           float32
      D             int32
      E          category
      F            object
      dtype: object

能夠看出DataFrame和Series同樣,在沒有指定索引時,會自動生成一個數字的索引,這在後續的操做中十分重要。

查看 Viewing Data

  • 查看開頭幾行或者末尾幾行:

    df.head()
    Out[30]: 
                       A         B         C         D
    2019-01-01  0.671622  0.785726  0.392435  0.874692
    2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
    2019-01-03  1.364425 -0.947641  2.386880  0.585372
    2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
    2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
    df.tail(3)
    Out[31]: 
                       A         B         C         D
    2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
    2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
    2019-01-06  0.221597 -0.753038 -1.741256  0.287280

    能夠經過添加行數參數來輸出,默認爲輸出5行。

  • 查看索引和列名

    df.index
    Out[32]: 
    DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
                   '2019-01-05', '2019-01-06'],
                  dtype='datetime64[ns]', freq='D')
    df.columns
    Out[33]: Index(['A', 'B', 'C', 'D'], dtype='object')
  • 使用DataFrame.to_numpy()轉化爲numpy數據。須要注意的是因爲numpy array類型數據只可包含一種格式,而DataFrame類型數據可包含多種格式,因此在轉換過程當中,pandas會找到一種能夠處理DateFrame中國全部格式的numpy array格式,好比object。這個過程會耗費必定的計算量。

    df.to_numpy()
    Out[35]: 
    array([[ 0.67162219,  0.78572584,  0.39243527,  0.87469243],
           [-2.42070338, -1.11620768, -0.34607048,  0.78594081],
           [ 1.36442543, -0.94764138,  2.38688005,  0.58537186],
           [-0.48597971, -1.28145415,  0.35406263, -1.41885798],
           [-1.12271697, -2.78904135, -0.79181242, -0.17434484],
           [ 0.22159737, -0.75303807, -1.74125564,  0.28728004]])
    df2.to_numpy()
    Out[36]: 
    array([[1.0, Timestamp('2019-01-01 00:00:00'), 1.0, 3, 'test', 'foo'],
           [1.0, Timestamp('2019-01-01 00:00:00'), 1.0, 3, 'tain', 'foo'],
           [1.0, Timestamp('2019-01-01 00:00:00'), 1.0, 3, 'test', 'foo'],
           [1.0, Timestamp('2019-01-01 00:00:00'), 1.0, 3, 'train', 'foo']],
          dtype=object)

    上面df所有爲float類型,因此轉換會很快,而df2涉及多種類型轉換,最後所有變成了object類型元素。

  • 查看數據的簡要統計結果

    df.describe()
    Out[37]: 
                  A         B         C         D
    count  6.000000  6.000000  6.000000  6.000000
    mean  -0.295293 -1.016943  0.042373  0.156680
    std    1.356107  1.144047  1.396030  0.860725
    min   -2.420703 -2.789041 -1.741256 -1.418858
    25%   -0.963533 -1.240143 -0.680377 -0.058939
    50%   -0.132191 -1.031925  0.003996  0.436326
    75%    0.559116 -0.801689  0.382842  0.735799
    max    1.364425  0.785726  2.386880  0.874692
  • 轉置

    df.T
    Out[38]: 
       2019-01-01  2019-01-02  2019-01-03  2019-01-04  2019-01-05  2019-01-06
    A    0.671622   -2.420703    1.364425   -0.485980   -1.122717    0.221597
    B    0.785726   -1.116208   -0.947641   -1.281454   -2.789041   -0.753038
    C    0.392435   -0.346070    2.386880    0.354063   -0.791812   -1.741256
    D    0.874692    0.785941    0.585372   -1.418858   -0.174345    0.287280
  • 按座標軸排序,其中axis參數爲座標軸,axis默認爲0,即橫軸(對行排序),axis=1則爲縱軸(對列排序);asceding參數默認爲True,即升序排序,ascending=False則爲降序排序:

    df.sort_index(axis=1)
    Out[44]: 
                       A         B         C         D
    2019-01-01  0.671622  0.785726  0.392435  0.874692
    2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
    2019-01-03  1.364425 -0.947641  2.386880  0.585372
    2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
    2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
    2019-01-06  0.221597 -0.753038 -1.741256  0.287280
    df.sort_index(axis=1, ascending=False)
    Out[45]: 
                       D         C         B         A
    2019-01-01  0.874692  0.392435  0.785726  0.671622
    2019-01-02  0.785941 -0.346070 -1.116208 -2.420703
    2019-01-03  0.585372  2.386880 -0.947641  1.364425
    2019-01-04 -1.418858  0.354063 -1.281454 -0.485980
    2019-01-05 -0.174345 -0.791812 -2.789041 -1.122717
    2019-01-06  0.287280 -1.741256 -0.753038  0.221597

    可見df.sort_index(axis=1)是按列名升序排序,因此看起來沒有變化,當設置ascending=False時,列順序變成了DCBA

  • 按數值排序:

    df.sort_values(by='B')
    Out[46]: 
                       A         B         C         D
    2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
    2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
    2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
    2019-01-03  1.364425 -0.947641  2.386880  0.585372
    2019-01-06  0.221597 -0.753038 -1.741256  0.287280
    2019-01-01  0.671622  0.785726  0.392435  0.874692
    df.sort_values(by='B', ascending=False)
    Out[47]: 
                       A         B         C         D
    2019-01-01  0.671622  0.785726  0.392435  0.874692
    2019-01-06  0.221597 -0.753038 -1.741256  0.287280
    2019-01-03  1.364425 -0.947641  2.386880  0.585372
    2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
    2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
    2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345

篩選 Selection

  • 獲取某列

    df['A']
    Out[49]: 
    2019-01-01    0.671622
    2019-01-02   -2.420703
    2019-01-03    1.364425
    2019-01-04   -0.485980
    2019-01-05   -1.122717
    2019-01-06    0.221597
    Freq: D, Name: A, dtype: float64
    type(df.A)
    Out[52]: pandas.core.series.Series

    也可直接用df.A,注意這裏是大小寫敏感的,這時候獲取的是一個Series類型數據。

  • 選擇多行

    df[0:3]
    Out[53]: 
                       A         B         C         D
    2019-01-01  0.671622  0.785726  0.392435  0.874692
    2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
    2019-01-03  1.364425 -0.947641  2.386880  0.585372
    df['20190102':'20190104']
    Out[54]: 
                       A         B         C         D
    2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
    2019-01-03  1.364425 -0.947641  2.386880  0.585372
    2019-01-04 -0.485980 -1.281454  0.354063 -1.418858

    經過一個[]會經過索引對行進行切片,因爲前面設置了索引爲日期格式,因此能夠方便的直接使用日期範圍進行篩選。

  • 經過標籤選擇

    • 選擇某行

      df.loc[dates[0]]
      Out[57]: 
      A    0.671622
      B    0.785726
      C    0.392435
      D    0.874692
      Name: 2019-01-01 00:00:00, dtype: float64
    • 選擇指定行列的數據

      df.loc[:, ('A', 'C')]
      Out[58]: 
                         A         C
      2019-01-01  0.671622  0.392435
      2019-01-02 -2.420703 -0.346070
      2019-01-03  1.364425  2.386880
      2019-01-04 -0.485980  0.354063
      2019-01-05 -1.122717 -0.791812
      2019-01-06  0.221597 -1.741256
      
      df.loc['20190102':'20190105', ('A', 'C')]
      Out[62]: 
                         A         C
      2019-01-02 -2.420703 -0.346070
      2019-01-03  1.364425  2.386880
      2019-01-04 -0.485980  0.354063
      2019-01-05 -1.122717 -0.791812

      傳入第一個參數是行索引標籤範圍,第二個是列索引標籤,:表明所有。

    • 選定某值

      df.loc['20190102', 'A']
      Out[69]: -2.420703380445092
      df.at[dates[1], 'A']
      Out[70]: -2.420703380445092

      能夠經過loc[]at[]兩種方式來獲取某值,但須要注意的是,因爲行索引爲datetime類型,使用loc[]方式獲取時,可直接使用20190102字符串來代替,而在at[]中,必須傳入datetime類型,不然會有報錯:

      df.at['20190102', 'A']
      
        File "pandas/_libs/index.pyx", line 81, in pandas._libs.index.IndexEngine.get_value
        File "pandas/_libs/index.pyx", line 89, in pandas._libs.index.IndexEngine.get_value
        File "pandas/_libs/index.pyx", line 449, in pandas._libs.index.DatetimeEngine.get_loc
        File "pandas/_libs/index.pyx", line 455, in pandas._libs.index.DatetimeEngine._date_check_type
      KeyError: '20190102'
  • 經過位置選擇

    • 選擇某行

      df.iloc[3]
      Out[71]: 
      A   -0.485980
      B   -1.281454
      C    0.354063
      D   -1.418858
      Name: 2019-01-04 00:00:00, dtype: float64

      iloc[]方法的參數,必須是數值。

    • 選擇指定行列的數據

      df.iloc[3:5, 0:2]
      Out[72]: 
                         A         B
      2019-01-04 -0.485980 -1.281454
      2019-01-05 -1.122717 -2.789041
      df.iloc[:,:]
      Out[73]: 
                         A         B         C         D
      2019-01-01  0.671622  0.785726  0.392435  0.874692
      2019-01-02 -2.420703 -1.116208 -0.346070  0.785941
      2019-01-03  1.364425 -0.947641  2.386880  0.585372
      2019-01-04 -0.485980 -1.281454  0.354063 -1.418858
      2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345
      2019-01-06  0.221597 -0.753038 -1.741256  0.287280
      
      df.iloc[[1, 2, 4], [0, 2]]
      Out[74]: 
                         A         C
      2019-01-02 -2.420703 -0.346070
      2019-01-03  1.364425  2.386880
      2019-01-05 -1.122717 -0.791812

      loc[]:表明所有。

    • 選擇某值

      df.iloc[1, 1]
      Out[75]: -1.1162076820700824
      df.iat[1, 1]
      Out[76]: -1.1162076820700824

      能夠經過iloc[]iat[]兩種方法獲取數值。

  • 按條件判斷選擇

    • 按某列的數值判斷選擇

      df[df.A > 0]
      Out[77]: 
                         A         B         C         D
      2019-01-01  0.671622  0.785726  0.392435  0.874692
      2019-01-03  1.364425 -0.947641  2.386880  0.585372
      2019-01-06  0.221597 -0.753038 -1.741256  0.287280
    • 篩選出符合要求的數據

      df[df > 0]
      Out[78]: 
                         A         B         C         D
      2019-01-01  0.671622  0.785726  0.392435  0.874692
      2019-01-02       NaN       NaN       NaN  0.785941
      2019-01-03  1.364425       NaN  2.386880  0.585372
      2019-01-04       NaN       NaN  0.354063       NaN
      2019-01-05       NaN       NaN       NaN       NaN
      2019-01-06  0.221597       NaN       NaN  0.287280

      不符合要求的數據均會被賦值爲空NaN

    • 使用isin()方法篩選

      df2 = df.copy()
      df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three']
      df2
      Out[88]: 
                         A         B         C         D      E
      2019-01-01  0.671622  0.785726  0.392435  0.874692    one
      2019-01-02 -2.420703 -1.116208 -0.346070  0.785941    one
      2019-01-03  1.364425 -0.947641  2.386880  0.585372    two
      2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  three
      2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345   four
      2019-01-06  0.221597 -0.753038 -1.741256  0.287280  three
      df2['E'].isin(['two', 'four'])
      Out[89]: 
      2019-01-01    False
      2019-01-02    False
      2019-01-03     True
      2019-01-04    False
      2019-01-05     True
      2019-01-06    False
      Freq: D, Name: E, dtype: bool
      df2[df2['E'].isin(['two', 'four'])]
      Out[90]: 
                         A         B         C         D     E
      2019-01-03  1.364425 -0.947641  2.386880  0.585372   two
      2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  four

      注意isin必須嚴格一致才行,df中的默認數值小數點位數很長,並不是顯示的5位,爲了方便展現,因此新增了E列。直接用原數值,狀況以下,可看出[1,1]位置符合要求。

      df.isin([-1.1162076820700824])
      Out[95]: 
                      A      B      C      D
      2019-01-01  False  False  False  False
      2019-01-02  False   True  False  False
      2019-01-03  False  False  False  False
      2019-01-04  False  False  False  False
      2019-01-05  False  False  False  False
      2019-01-06  False  False  False  False
  • 設定值

    • 經過指定索引設定列

      s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20190102', periods=6))
      s1
      Out[98]: 
      2019-01-02    1
      2019-01-03    2
      2019-01-04    3
      2019-01-05    4
      2019-01-06    5
      2019-01-07    6
      Freq: D, dtype: int64
      df['F']=s1
      df
      Out[101]: 
                         A         B         C         D    F
      2019-01-01  0.671622  0.785726  0.392435  0.874692  NaN
      2019-01-02 -2.420703 -1.116208 -0.346070  0.785941  1.0
      2019-01-03  1.364425 -0.947641  2.386880  0.585372  2.0
      2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  3.0
      2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  4.0
      2019-01-06  0.221597 -0.753038 -1.741256  0.287280  5.0

      空值會自動填充爲NaN

    • 經過標籤設定值

      df.at[dates[0], 'A'] = 0
      df
      Out[103]: 
                         A         B         C         D    F
      2019-01-01  0.000000  0.785726  0.392435  0.874692  NaN
      2019-01-02 -2.420703 -1.116208 -0.346070  0.785941  1.0
      2019-01-03  1.364425 -0.947641  2.386880  0.585372  2.0
      2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  3.0
      2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  4.0
      2019-01-06  0.221597 -0.753038 -1.741256  0.287280  5.0
    • 經過爲止設定值

      df.iat[0, 1] = 0
      df
      Out[105]: 
                         A         B         C         D    F
      2019-01-01  0.000000  0.000000  0.392435  0.874692  NaN
      2019-01-02 -2.420703 -1.116208 -0.346070  0.785941  1.0
      2019-01-03  1.364425 -0.947641  2.386880  0.585372  2.0
      2019-01-04 -0.485980 -1.281454  0.354063 -1.418858  3.0
      2019-01-05 -1.122717 -2.789041 -0.791812 -0.174345  4.0
      2019-01-06  0.221597 -0.753038 -1.741256  0.287280  5.0
    • 經過NumPy array設定值

      df.loc[:, 'D'] = np.array([5] * len(df))
      df
      Out[109]: 
                         A         B         C  D    F
      2019-01-01  0.000000  0.000000  0.392435  5  NaN
      2019-01-02 -2.420703 -1.116208 -0.346070  5  1.0
      2019-01-03  1.364425 -0.947641  2.386880  5  2.0
      2019-01-04 -0.485980 -1.281454  0.354063  5  3.0
      2019-01-05 -1.122717 -2.789041 -0.791812  5  4.0
      2019-01-06  0.221597 -0.753038 -1.741256  5  5.0
    • 經過條件判斷設定值

      df2 = df.copy()
      df2[df2 > 0] = -df2
      df2
      Out[112]: 
                         A         B         C  D    F
      2019-01-01  0.000000  0.000000 -0.392435 -5  NaN
      2019-01-02 -2.420703 -1.116208 -0.346070 -5 -1.0
      2019-01-03 -1.364425 -0.947641 -2.386880 -5 -2.0
      2019-01-04 -0.485980 -1.281454 -0.354063 -5 -3.0
      2019-01-05 -1.122717 -2.789041 -0.791812 -5 -4.0
      2019-01-06 -0.221597 -0.753038 -1.741256 -5 -5.0

空值處理 Missing Data

pandas默認使用np.nan來表示空值,在統計計算中會直接忽略。

經過reindex()方法能夠新增、修改、刪除某座標軸(行或列)的索引,並返回一個數據的拷貝:

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df1.loc[dates[0]:dates[1], 'E'] = 1
df1
Out[115]: 
                   A         B         C  D    F    E
2019-01-01  0.000000  0.000000  0.392435  5  NaN  1.0
2019-01-02 -2.420703 -1.116208 -0.346070  5  1.0  1.0
2019-01-03  1.364425 -0.947641  2.386880  5  2.0  NaN
2019-01-04 -0.485980 -1.281454  0.354063  5  3.0  NaN
  • 刪除空值

    df1.dropna(how='any')
    Out[116]: 
                       A         B        C  D    F    E
    2019-01-02 -2.420703 -1.116208 -0.34607  5  1.0  1.0
  • 填充空值

    df1.fillna(value=5)
    Out[117]: 
                       A         B         C  D    F    E
    2019-01-01  0.000000  0.000000  0.392435  5  5.0  1.0
    2019-01-02 -2.420703 -1.116208 -0.346070  5  1.0  1.0
    2019-01-03  1.364425 -0.947641  2.386880  5  2.0  5.0
    2019-01-04 -0.485980 -1.281454  0.354063  5  3.0  5.0
  • 判斷是否爲空值

    pd.isna(df1)
    Out[118]: 
                    A      B      C      D      F      E
    2019-01-01  False  False  False  False   True  False
    2019-01-02  False  False  False  False  False  False
    2019-01-03  False  False  False  False  False   True
    2019-01-04  False  False  False  False  False   True

運算 Operations

  • 統計

    注意 全部的統計默認是不包含空值的

    • 平均值

      默認狀況是按列求平均值:

      df.mean()
      Out[119]: 
      A   -0.407230
      B   -1.147897
      C    0.042373
      D    5.000000
      F    3.000000
      dtype: float64

      若是須要按行求平均值,需指定軸參數:

      df.mean(1)
      Out[120]: 
      2019-01-01    1.348109
      2019-01-02    0.423404
      2019-01-03    1.960733
      2019-01-04    1.317326
      2019-01-05    0.859286
      2019-01-06    1.545461
      Freq: D, dtype: float64
    • 數值移動

      s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates)
      s
      Out[122]: 
      2019-01-01    1.0
      2019-01-02    3.0
      2019-01-03    5.0
      2019-01-04    NaN
      2019-01-05    6.0
      2019-01-06    8.0
      Freq: D, dtype: float64
      s = s.shift(2)
      s
      Out[125]: 
      2019-01-01    NaN
      2019-01-02    NaN
      2019-01-03    1.0
      2019-01-04    3.0
      2019-01-05    5.0
      2019-01-06    NaN
      Freq: D, dtype: float64

      這裏將s的值移動兩個,那麼空出的部分會自動使用NaN填充。

    • 不一樣維度間的運算,pandas會自動擴展維度:

      df.sub(s, axis='index')
      Out[128]: 
                         A         B         C    D    F
      2019-01-01       NaN       NaN       NaN  NaN  NaN
      2019-01-02       NaN       NaN       NaN  NaN  NaN
      2019-01-03  0.364425 -1.947641  1.386880  4.0  1.0
      2019-01-04 -3.485980 -4.281454 -2.645937  2.0  0.0
      2019-01-05 -6.122717 -7.789041 -5.791812  0.0 -1.0
      2019-01-06       NaN       NaN       NaN  NaN  NaN
  • 應用

    經過apply()方法,能夠對數據進行逐一操做:

    • 累計求和

      df.apply(np.cumsum)
      Out[130]: 
                         A         B         C   D     F
      2019-01-01  0.000000  0.000000  0.392435   5   NaN
      2019-01-02 -2.420703 -1.116208  0.046365  10   1.0
      2019-01-03 -1.056278 -2.063849  2.433245  15   3.0
      2019-01-04 -1.542258 -3.345303  2.787307  20   6.0
      2019-01-05 -2.664975 -6.134345  1.995495  25  10.0
      2019-01-06 -2.443377 -6.887383  0.254239  30  15.0

      這裏使用了apply()方法調用np.cumsum方法,也可直接使用df.cumsum():

      df.cumsum()
      Out[133]: 
                         A         B         C     D     F
      2019-01-01  0.000000  0.000000  0.392435   5.0   NaN
      2019-01-02 -2.420703 -1.116208  0.046365  10.0   1.0
      2019-01-03 -1.056278 -2.063849  2.433245  15.0   3.0
      2019-01-04 -1.542258 -3.345303  2.787307  20.0   6.0
      2019-01-05 -2.664975 -6.134345  1.995495  25.0  10.0
      2019-01-06 -2.443377 -6.887383  0.254239  30.0  15.0
    • 自定義方法

      經過自定義函數,配合apply()方法,能夠實現更多數據處理:

      df.apply(lambda x: x.max() - x.min())
      Out[134]: 
      A    3.785129
      B    2.789041
      C    4.128136
      D    0.000000
      F    4.000000
      dtype: float64
  • 矩陣

    統計矩陣中每一個元素出現的頻次:

    s = pd.Series(np.random.randint(0, 7, size=10))
    s
    Out[136]: 
    0    2
    1    0
    2    4
    3    0
    4    3
    5    3
    6    6
    7    4
    8    6
    9    5
    dtype: int64
    s.value_counts()
    Out[137]: 
    6    2
    4    2
    3    2
    0    2
    5    1
    2    1
    dtype: int64
  • String方法

    全部的Series類型均可以直接調用str的屬性方法來對每一個對象進行操做。

    • 好比轉換成大寫:

      s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
      s.str.upper()
      Out[139]: 
      0       A
      1       B
      2       C
      3    AABA
      4    BACA
      5     NaN
      6    CABA
      7     DOG
      8     CAT
      dtype: object
    • 分列:

      s = pd.Series(['A,b', 'c,d'])
      s
      Out[142]: 
      0    A,b
      1    c,d
      dtype: object
      s.str.split(',', expand=True)
      Out[143]: 
         0  1
      0  A  b
      1  c  d
    • 其餘方法:

      dir(str)
      Out[140]: 
      ['capitalize',
       'casefold',
       'center',
       'count',
       'encode',
       'endswith',
       'expandtabs',
       'find',
       'format',
       'format_map',
       'index',
       'isalnum',
       'isalpha',
       'isascii',
       'isdecimal',
       'isdigit',
       'isidentifier',
       'islower',
       'isnumeric',
       'isprintable',
       'isspace',
       'istitle',
       'isupper',
       'join',
       'ljust',
       'lower',
       'lstrip',
       'maketrans',
       'partition',
       'replace',
       'rfind',
       'rindex',
       'rjust',
       'rpartition',
       'rsplit',
       'rstrip',
       'split',
       'splitlines',
       'startswith',
       'strip',
       'swapcase',
       'title',
       'translate',
       'upper',
       'zfill']

合併 Merge

pandas`能夠提供不少方法能夠快速的合併各類類型的Series、DataFrame以及Panel Object。

  • Concat方法

    df = pd.DataFrame(np.random.randn(10, 4))
    df
    Out[145]: 
              0         1         2         3
    0 -0.227408 -0.185674 -0.187919  0.185685
    1  1.132517 -0.539992  1.156631 -0.022468
    2  0.214134 -1.283055 -0.862972  0.518942
    3  0.785903  1.033915 -0.471496 -1.403762
    4 -0.676717 -0.529971 -1.161988 -1.265071
    5  0.670126  1.320960 -0.128098  0.718631
    6  0.589902  0.349386  0.221955  1.749188
    7 -0.328885  0.607929 -0.973610 -0.928472
    8  1.724243 -0.661503 -0.374254  0.409250
    9  1.346625  0.618285  0.528776 -0.628470
    # break it into pieces
    pieces = [df[:3], df[3:7], df[7:]]
    pieces
    Out[147]: 
    [          0         1         2         3
     0 -0.227408 -0.185674 -0.187919  0.185685
     1  1.132517 -0.539992  1.156631 -0.022468
     2  0.214134 -1.283055 -0.862972  0.518942,
               0         1         2         3
     3  0.785903  1.033915 -0.471496 -1.403762
     4 -0.676717 -0.529971 -1.161988 -1.265071
     5  0.670126  1.320960 -0.128098  0.718631
     6  0.589902  0.349386  0.221955  1.749188,
               0         1         2         3
     7 -0.328885  0.607929 -0.973610 -0.928472
     8  1.724243 -0.661503 -0.374254  0.409250
     9  1.346625  0.618285  0.528776 -0.628470]
    pd.concat(pieces)
    Out[148]: 
              0         1         2         3
    0 -0.227408 -0.185674 -0.187919  0.185685
    1  1.132517 -0.539992  1.156631 -0.022468
    2  0.214134 -1.283055 -0.862972  0.518942
    3  0.785903  1.033915 -0.471496 -1.403762
    4 -0.676717 -0.529971 -1.161988 -1.265071
    5  0.670126  1.320960 -0.128098  0.718631
    6  0.589902  0.349386  0.221955  1.749188
    7 -0.328885  0.607929 -0.973610 -0.928472
    8  1.724243 -0.661503 -0.374254  0.409250
    9  1.346625  0.618285  0.528776 -0.628470
  • Merge方法

    這是相似sql的合併方法:

    left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
    right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})
    left
    Out[151]: 
       key  lval
    0  foo     1
    1  foo     2
    right
    Out[152]: 
       key  rval
    0  foo     4
    1  foo     5
    pd.merge(left, right, on='key')
    Out[153]: 
       key  lval  rval
    0  foo     1     4
    1  foo     1     5
    2  foo     2     4
    3  foo     2     5

    另外一個例子:

    left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
    right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
    left
    Out[156]: 
       key  lval
    0  foo     1
    1  bar     2
    right
    Out[157]: 
       key  rval
    0  foo     4
    1  bar     5
    pd.merge(left, right, on='key')
    Out[158]: 
       key  lval  rval
    0  foo     1     4
    1  bar     2     5
  • Append方法

    在DataFrame中增長行

    df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])
    df
    Out[160]: 
              A         B         C         D
    0 -0.496709  0.573449  0.076059  0.685285
    1  0.479253  0.587376 -1.240070 -0.907910
    2 -0.052609 -0.287786 -1.949402  1.163323
    3 -0.659489  0.525583  0.820922 -1.368544
    4  1.270453 -1.813249  0.059915  0.586703
    5  1.859657  0.564274 -0.198763 -1.794173
    6 -0.649153 -3.129258  0.063418 -0.727936
    7  0.862402 -0.800031 -1.954784 -0.028607
    s = df.iloc[3]
    s
    Out[162]: 
    A   -0.659489
    B    0.525583
    C    0.820922
    D   -1.368544
    Name: 3, dtype: float64
    df.append(s, ignore_index=True)
    Out[163]: 
              A         B         C         D
    0 -0.496709  0.573449  0.076059  0.685285
    1  0.479253  0.587376 -1.240070 -0.907910
    2 -0.052609 -0.287786 -1.949402  1.163323
    3 -0.659489  0.525583  0.820922 -1.368544
    4  1.270453 -1.813249  0.059915  0.586703
    5  1.859657  0.564274 -0.198763 -1.794173
    6 -0.649153 -3.129258  0.063418 -0.727936
    7  0.862402 -0.800031 -1.954784 -0.028607
    8 -0.659489  0.525583  0.820922 -1.368544

    這裏要注意,咱們增長了ignore_index=True參數,若是不設置的話,那麼增長的新行的index仍然是3,這樣在後續的處理中可能有存在問題。具體也須要看狀況來處理。

    df.append(s)
    Out[164]: 
              A         B         C         D
    0 -0.496709  0.573449  0.076059  0.685285
    1  0.479253  0.587376 -1.240070 -0.907910
    2 -0.052609 -0.287786 -1.949402  1.163323
    3 -0.659489  0.525583  0.820922 -1.368544
    4  1.270453 -1.813249  0.059915  0.586703
    5  1.859657  0.564274 -0.198763 -1.794173
    6 -0.649153 -3.129258  0.063418 -0.727936
    7  0.862402 -0.800031 -1.954784 -0.028607
    3 -0.659489  0.525583  0.820922 -1.368544

分組 Grouping

通常分組統計有三個步驟:

  • 分組:選擇須要的數據
  • 計算:對每一個分組進行計算
  • 合併:把分組計算的結果合併爲一個數據結構中
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                    'foo', 'bar', 'foo', 'foo'],
                    'B': ['one', 'one', 'two', 'three',
                    'two', 'two', 'one', 'three'],
                    'C': np.random.randn(8),
                    'D': np.random.randn(8)})
df
Out[166]: 
     A      B         C         D
0  foo    one -1.252153  0.172863
1  bar    one  0.238547 -0.648980
2  foo    two  0.756975  0.195766
3  bar  three -0.933405 -0.320043
4  foo    two -0.310650 -1.388255
5  bar    two  1.568550 -1.911817
6  foo    one -0.340290 -2.141259

按A列分組並使用sum函數進行計算:

df.groupby('A').sum()
Out[167]: 
            C         D
A                      
bar  0.873692 -2.880840
foo -1.817027 -5.833961

這裏因爲B列沒法應用sum函數,因此直接被忽略了。

按A、B列分組並使用sum函數進行計算:

df.groupby(['A', 'B']).sum()
Out[168]: 
                  C         D
A   B                        
bar one    0.238547 -0.648980
    three -0.933405 -0.320043
    two    1.568550 -1.911817
foo one   -1.592443 -1.968396
    three -0.670909 -2.673075
    two    0.446325 -1.192490

這樣就有了一個多層index的結果集。

整形 Reshaping

  • 堆疊 Stack

    pythonzip函數能夠將對象中對應的元素打包成一個個的元組:

    tuples = list(zip(['bar', 'bar', 'baz', 'baz',
    'foo', 'foo', 'qux', 'qux'],
    ['one', 'two', 'one', 'two',
    'one', 'two', 'one', 'two']))
    tuples
    Out[172]: 
    [('bar', 'one'),
     ('bar', 'two'),
     ('baz', 'one'),
     ('baz', 'two'),
     ('foo', 'one'),
     ('foo', 'two'),
     ('qux', 'one'),
     ('qux', 'two')]
    ## 設置兩級索引
    index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
    index
    Out[174]: 
    MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
               codes=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
               names=['first', 'second'])
    ## 建立DataFrame
    df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
    df
    Out[176]: 
                         A         B
    first second                    
    bar   one    -0.501215 -0.947993
          two    -0.828914  0.232167
    baz   one     1.245419  1.006092
          two     1.016656 -0.441073
    foo   one     0.479037 -0.500034
          two    -1.113097  0.591696
    qux   one    -0.014760 -0.320735
          two    -0.648743  1.499899
    ## 選取DataFrame
    df2 = df[:4]
    df2
    Out[179]: 
                         A         B
    first second                    
    bar   one    -0.501215 -0.947993
          two    -0.828914  0.232167
    baz   one     1.245419  1.006092
          two     1.016656 -0.441073

    使用stack()方法,能夠經過堆疊的方式將二維數據變成爲一維數據:

    stacked = df2.stack()
    stacked
    Out[181]: 
    first  second   
    bar    one     A   -0.501215
                   B   -0.947993
           two     A   -0.828914
                   B    0.232167
    baz    one     A    1.245419
                   B    1.006092
           two     A    1.016656
                   B   -0.441073
    dtype: float64

    對應的逆操做爲unstacked()方法:

    stacked.unstack()
    Out[182]: 
                         A         B
    first second                    
    bar   one    -0.501215 -0.947993
          two    -0.828914  0.232167
    baz   one     1.245419  1.006092
          two     1.016656 -0.441073
    stacked.unstack(1)
    Out[183]: 
    second        one       two
    first                      
    bar   A -0.501215 -0.828914
          B -0.947993  0.232167
    baz   A  1.245419  1.016656
          B  1.006092 -0.441073
    stacked.unstack(0)
    Out[184]: 
    first          bar       baz
    second                      
    one    A -0.501215  1.245419
           B -0.947993  1.006092
    two    A -0.828914  1.016656
           B  0.232167 -0.441073

    unstack()默認對最後一層級進行操做,也可經過輸入參數指定。

  • 表格轉置

    df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,
    'B': ['A', 'B', 'C'] * 4,
    'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
    'D': np.random.randn(12),
    'E': np.random.randn(12)})
    df
    Out[190]: 
            A  B    C         D         E
    0     one  A  foo -0.933264 -2.387490
    1     one  B  foo -0.288101  0.023214
    2     two  C  foo  0.594490  0.418505
    3   three  A  bar  0.450683  1.939623
    4     one  B  bar  0.243897 -0.965783
    5     one  C  bar -0.705494 -0.078283
    6     two  A  foo  1.560352  0.419907
    7   three  B  foo  0.199453  0.998711
    8     one  C  foo  1.426861 -1.108297
    9     one  A  bar -0.570951 -0.022560
    10    two  B  bar -0.350937 -1.767804
    11  three  C  bar  0.983465  0.065792

    經過pivot_table()方法能夠很方便的進行行列的轉換:

    pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
    Out[191]: 
    C             bar       foo
    A     B                    
    one   A -0.570951 -0.933264
          B  0.243897 -0.288101
          C -0.705494  1.426861
    three A  0.450683       NaN
          B       NaN  0.199453
          C  0.983465       NaN
    two   A       NaN  1.560352
          B -0.350937       NaN
          C       NaN  0.594490

    轉換中,涉及到空值部分會自動填充爲NaN

時間序列 Time Series

pandas的在時序轉換方面十分強大,能夠很方便的進行各類轉換。

  • 時間間隔調整

    rng = pd.date_range('1/1/2019', periods=100, freq='S')
    rng[:5]
    Out[214]: 
    DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 00:00:01',
                   '2019-01-01 00:00:02', '2019-01-01 00:00:03',
                   '2019-01-01 00:00:04'],
                  dtype='datetime64[ns]', freq='S')
    ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
    ts.head(5)
    Out[216]: 
    2019-01-01 00:00:00    245
    2019-01-01 00:00:01    347
    2019-01-01 00:00:02    113
    2019-01-01 00:00:03    196
    2019-01-01 00:00:04    131
    Freq: S, dtype: int64
    ## 按10s間隔進行從新採樣
    ts1 = ts.resample('10S')
    ts1
    Out[209]: DatetimeIndexResampler [freq=<10 * Seconds>, axis=0, closed=left, label=left, convention=start, base=0]
    ## 用求平均的方式進行數據整合    
    ts1.mean()
    Out[218]: 
    2019-01-01 00:00:00    174.0
    2019-01-01 00:00:10    278.5
    2019-01-01 00:00:20    281.8
    2019-01-01 00:00:30    337.2
    2019-01-01 00:00:40    221.0
    2019-01-01 00:00:50    277.1
    2019-01-01 00:01:00    171.0
    2019-01-01 00:01:10    321.0
    2019-01-01 00:01:20    318.6
    2019-01-01 00:01:30    302.6
    Freq: 10S, dtype: float64
    ## 用求和的方式進行數據整合 
    ts1.sum()
    Out[219]: 
    2019-01-01 00:00:00    1740
    2019-01-01 00:00:10    2785
    2019-01-01 00:00:20    2818
    2019-01-01 00:00:30    3372
    2019-01-01 00:00:40    2210
    2019-01-01 00:00:50    2771
    2019-01-01 00:01:00    1710
    2019-01-01 00:01:10    3210
    2019-01-01 00:01:20    3186
    2019-01-01 00:01:30    3026
    Freq: 10S, dtype: int64

    這裏先經過resample進行重採樣,在指定sum()或者mean()等方式來指定衝採樣的處理方式。

  • 顯示時區:

    rng = pd.date_range('1/1/2019 00:00', periods=5, freq='D')
    rng
    Out[221]: 
    DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
                   '2019-01-05'],
                  dtype='datetime64[ns]', freq='D')
    ts = pd.Series(np.random.randn(len(rng)), rng)
    ts
    Out[223]: 
    2019-01-01   -2.327686
    2019-01-02    1.527872
    2019-01-03    0.063982
    2019-01-04   -0.213572
    2019-01-05   -0.014856
    Freq: D, dtype: float64
    ts_utc = ts.tz_localize('UTC')
    ts_utc
    Out[225]: 
    2019-01-01 00:00:00+00:00   -2.327686
    2019-01-02 00:00:00+00:00    1.527872
    2019-01-03 00:00:00+00:00    0.063982
    2019-01-04 00:00:00+00:00   -0.213572
    2019-01-05 00:00:00+00:00   -0.014856
    Freq: D, dtype: float64
  • 轉換時區:

    ts_utc.tz_convert('US/Eastern')
    Out[226]: 
    2018-12-31 19:00:00-05:00   -2.327686
    2019-01-01 19:00:00-05:00    1.527872
    2019-01-02 19:00:00-05:00    0.063982
    2019-01-03 19:00:00-05:00   -0.213572
    2019-01-04 19:00:00-05:00   -0.014856
    Freq: D, dtype: float64
  • 時間格式轉換

    rng = pd.date_range('1/1/2019', periods=5, freq='M')
    ts = pd.Series(np.random.randn(len(rng)), index=rng)
    ts
    Out[230]: 
    2019-01-31    0.197134
    2019-02-28    0.569082
    2019-03-31   -0.322141
    2019-04-30    0.005778
    2019-05-31   -0.082306
    Freq: M, dtype: float64
    ps = ts.to_period()
    ps
    Out[232]: 
    2019-01    0.197134
    2019-02    0.569082
    2019-03   -0.322141
    2019-04    0.005778
    2019-05   -0.082306
    Freq: M, dtype: float64
    ps.to_timestamp()
    Out[233]: 
    2019-01-01    0.197134
    2019-02-01    0.569082
    2019-03-01   -0.322141
    2019-04-01    0.005778
    2019-05-01   -0.082306
    Freq: MS, dtype: float64

    在是時間段和時間轉換過程當中,有一些很方便的算術方法可使用,好比咱們轉換以下兩個頻率:

    一、按季度劃分,且每一個年的最後一個月是11月。

    二、按季度劃分,每月開始爲頻率一中下一個月的早上9點。

    prng = pd.period_range('2018Q1', '2019Q4', freq='Q-NOV')
    prng
    Out[243]: 
    PeriodIndex(['2018Q1', '2018Q2', '2018Q3', '2018Q4', '2019Q1', '2019Q2',
                 '2019Q3', '2019Q4'],
                dtype='period[Q-NOV]', freq='Q-NOV')
    ts = pd.Series(np.random.randn(len(prng)), prng)
    ts
    Out[245]: 
    2018Q1   -0.112692
    2018Q2   -0.507304
    2018Q3   -0.324846
    2018Q4    0.549671
    2019Q1   -0.897732
    2019Q2    1.130070
    2019Q3   -0.399814
    2019Q4    0.830488
    Freq: Q-NOV, dtype: float64
    ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
    ts
    Out[247]: 
    2018-03-01 09:00   -0.112692
    2018-06-01 09:00   -0.507304
    2018-09-01 09:00   -0.324846
    2018-12-01 09:00    0.549671
    2019-03-01 09:00   -0.897732
    2019-06-01 09:00    1.130070
    2019-09-01 09:00   -0.399814
    2019-12-01 09:00    0.830488
    Freq: H, dtype: float64

    注意:這個例子有點怪。能夠這樣理解,咱們先將prng直接轉換爲按小時顯示:

    prng.asfreq('H', 'end') 
    Out[253]: 
    PeriodIndex(['2018-02-28 23:00', '2018-05-31 23:00', '2018-08-31 23:00',
                 '2018-11-30 23:00', '2019-02-28 23:00', '2019-05-31 23:00',
                 '2019-08-31 23:00', '2019-11-30 23:00'],
                dtype='period[H]', freq='H')

    咱們要把時間轉換爲下一個月的早上9點,因此先轉換爲按月顯示,並每月加1(即下個月),而後按小時顯示並加9(早上9點)。

    另外例子中s參數是start的簡寫,e參數是end的簡寫,Q-NOV即表示按季度,且每一年的NOV是最後一個月。

    更多了freq簡稱能夠參考:http://pandas.pydata.org/pand...

    asfreq()方法介紹可參考:http://pandas.pydata.org/pand...

分類目錄類型 Categoricals

關於Categories類型介紹能夠參考:http://pandas.pydata.org/pand...

  • 類型轉換:astype('category')

    df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6],
    "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']})
    df
    Out[255]: 
       id raw_grade
    0   1         a
    1   2         b
    2   3         b
    3   4         a
    4   5         a
    5   6         e
    df['grade'] = df['raw_grade'].astype('category')
    df['grade']
    Out[257]: 
    0    a
    1    b
    2    b
    3    a
    4    a
    5    e
    Name: grade, dtype: category
    Categories (3, object): [a, b, e]
  • 重命名分類:cat

    df["grade"].cat.categories = ["very good", "good", "very bad"]
    df['grade']
    Out[269]: 
    0    very good
    1         good
    2         good
    3    very good
    4    very good
    5     very bad
    Name: grade, dtype: category
    Categories (3, object): [very good, good, very bad]
  • 重分類:

    df['grade'] = df['grade'].cat.set_categories(["very bad", "bad", "medium","good", "very good"])
    df['grade']
    Out[271]: 
    0    very good
    1         good
    2         good
    3    very good
    4    very good
    5     very bad
    Name: grade, dtype: category
    Categories (5, object): [very bad, bad, medium, good, very good]
  • 排列

    df.sort_values(by="grade")
    Out[272]: 
       id raw_grade      grade
    5   6         e   very bad
    1   2         b       good
    2   3         b       good
    0   1         a  very good
    3   4         a  very good
    4   5         a  very good
  • 分組

    df.groupby("grade").size()
    Out[273]: 
    grade
    very bad     1
    bad          0
    medium       0
    good         2
    very good    3
    dtype: int64

畫圖 Plotting

  • Series

    ts = pd.Series(np.random.randn(1000),
    index=pd.date_range('1/1/2000', periods=1000))
    ts = pd.Series(np.random.randn(1000),
    index=pd.date_range('1/1/2019', periods=1000))
    ts = ts.cumsum()
    ts.plot()
    Out[277]: <matplotlib.axes._subplots.AxesSubplot at 0x1135bcc50>
    import matplotlib.pyplot as plt
    plt.show()

    圖片描述

  • DataFrame畫圖

    使用plot能夠把全部的列都經過標籤的形式展現出來:

    df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
    columns=['A', 'B', 'C', 'D'])
    df = df.cumsum()
    plt.figure()
    Out[282]: <Figure size 640x480 with 0 Axes>
    df.plot()
    Out[283]: <matplotlib.axes._subplots.AxesSubplot at 0x11587e4e0>
    plt.legend(loc='best')

    圖片描述

導入導出數據 Getting Data In/Out

  • CSV

    • 寫入:

      df.to_csv('foo.csv')
    • 讀取:

      pd.read_csv('foo.csv')
  • HDF5

    • 寫入:

      df.to_hdf('foo.h5', 'df')
    • 讀取:

      pd.read_hdf('foo.h5', 'df')
  • Excel

    • 寫入:

      df.to_excel('foo.xlsx', sheet_name='Sheet1')
    • 讀取:

      pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA'])

異常處理 Gotchas

若是有一些異常狀況好比:

>>> if pd.Series([False, True, False]):
...     print("I was true")
Traceback
    ...
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

能夠參考以下連接:

http://pandas.pydata.org/pand...

http://pandas.pydata.org/pand...

相關文章
相關標籤/搜索