pandas入門之DataFrame

時間 2019-11-20

原文原文鏈接

建立DataFrame

- DataFrame是一個【表格型】的數據結構。DataFrame由按必定順序排列的多列數據組成。設計初衷是將Series的使用場景從一維拓展到多維。DataFrame既有行索引，也有列索引。
- 建立DataFrame的方式
    - 列表
    - 字典
    - 系列
    - Numpy ndarrays
    - 另外一個數據幀(DataFrame)
- DataFrame的參數
    - data   數據採起各類形式，如:ndarray，series，map，lists，dict，constant和另外一個DataFrame。
    - index   對於行標籤，要用於結果幀的索引是可選缺省值np.arrange(n)，若是沒有傳遞索引值。
    - columns  對於列標籤，可選的默認語法是 - np.arange(n)。 這隻有在沒有索引傳遞的狀況下才是這樣。
    - dtype   每列的數據類型。
    - copy   若是默認值爲False，則此命令(或任何它)用於複製數據。

列表建立DataFrame

單個列表

data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

   0
0  1
1  2
2  3
3  4
4  5

列表套列表

# 列表套列表
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=["name","age"],dtype=float) # dtype指定輸出的數字類型,可加可不加
print(df)

     name   age
0    Alex  10.0
1     Bob  12.0
2  Clarke  13.0

ndarrays/Lists[多維數組]的字典來建立DataFrame

- 全部的ndarrays必須具備相同的長度。若是傳遞了索引(index)，則索引的長度應等於數組的長度。
- 若是沒有傳遞索引，則默認狀況下，索引將爲range(n)，其中n爲數組長度。

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)   # 0,1,2,3 就是range(數組)獲得的值


    Name  Age
0    Tom   28
1   Jack   34
2  Steve   29
3  Ricky   42

指定索引數組

import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data,index=['1','2','3','4'])   # 指定索引
print(df)

    Name  Age
1    Tom   28
2   Jack   34
3  Steve   29
4  Ricky   42

字典列表建立DataFrame 【列表中套字典】

# 字典列表可做爲輸入數據傳遞以用來建立數據幀(DataFrame)，
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]  # 字典鍵默認爲列名,沒有值得爲NaN
df = pd.DataFrame(data,index=["first","second"])  # 自定義行索引
print(df)

        a   b     c
first   1   2   NaN
second  5  10  20.0

使用字典，行索引和列索引列表建立DataFrame

data = [{"name":"alex","age":87,"gender":"男"},{"name":"wuchao","age":20,"gender":"男"}]
df = pd.DataFrame(data,index=[1,2],columns=["name","age","gender"])  # 自定義行索引和列索引
print(df)

     name  age   gender
1    alex   87      男
2  wuchao   20      男

從Series的字典來建立數據幀

字典的系列能夠傳遞以造成一個DataFrame。所獲得的索引是經過的全部系列索引的並集

data = {
        "one":pd.Series(["1","2","3"],index=["a","b","c"],dtype=float), # 指定數字輸出類型
        "tow":pd.Series(["1","2","3","4"],index=["a","b","c","d"])
       }
df = pd.DataFrame(data)
print(df)

   one tow
a  1.0   1
b  2.0   2
c  3.0   3
d  NaN   4

numpy 建立DataFrame

pd.DataFrame(np.random.randint(60,100,size=(3,4)))  # 60-100隨機選擇,3行4列


　　　0      1     2     3
0    95    74    71    92
1    95    91    79    98
2    94    87    62    65

指定索引數據結構

pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","d"])  # 60-100隨機選擇,3行4列 指定行索引和列索引


　　　 a     b     c    d
A    91    70    63    98
B    98    68    88    96
C    99    77    86    66

DataFrame屬性

values 取出全部值
columns 列索引
index 行索引
shape 當前表是幾行幾列

res = pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","d"])
res.values   # 取出全部數據
res.index    # 取出行索引
res.columns  # 取出列索引
res.shape    # 顯示當前數據是幾行幾列

============================================

練習

根據如下考試成績表，建立一個DataFrame，命名爲df：
```
    張三  李四  
語文 150  0
數學 150  0
英語 150  0
理綜 300  0
```

============================================

dic = {
    "張三":[150,150,150,300],
    "李四":[0,0,0,0]
}
df = pd.DataFrame(dic,index=["語文","數學","英語","理綜"])
df



       張三   李四
語文    150    0
數學    150    0
英語    150    0
理綜    300    0

DataFrame 索引

列索引

(1) 對列進行索引

    - 經過相似字典的方式  df['q']
    - 經過屬性的方式     df.q

 能夠將DataFrame的列獲取爲一個Series。返回的Series擁有原DataFrame相同的索引，且name屬性也已經設置好了，就是相應的列名。

res = pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","d"])
res

      a     b    c     d
A    95    83    92    89
B    70    96    92    67
C    65    69    85    78

# 屬性方式
res.a
A    95
B    70
C    65
Name: a, dtype: int32

# 字典方式
res["a"]
A    95
B    70
C    65
Name: a, dtype: int32

# 修改列索引
res.columns=["aa","bb","cc","dd"]
res
     aa    bb    cc    dd
A    76    90    91    78
B    80    81    82    85
C    93    70    63    81

# 讀取前兩列
res[["aa","bb"]]
     aa    bb
A    76    90
B    80    81
C    93    70

行索引

- 使用.loc[]加index來進行行索引
- 使用.iloc[]加整數來進行行索引
    
一樣返回一個Series，index爲原來的columns。

演示dom

res = pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","d"])
res
      a    b     c     d
A    91    83    96    75
B    88    92    91    60
C    73    79    72    79

查詢spa

# loc方式
res.loc["A"]

a    91
b    83
c    96
d    75
Name: A, dtype: int32

# iloc方式
res.iloc[0]

a    91
b    83
c    96
d    75
Name: A, dtype: int32


res.loc[["A","B"]]

　　　　a    b     c     d
A    95    83    92    89
B    70    96    92    67

元素索引的方法

 - 使用列索引
 - 使用行索引(iloc[3,1] or loc['C','q']) 行索引在前，列索引在後

res = pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","d"])
res


      a     b    c     d
A    95    83    92    89
B    70    96    92    67
C    65    69    85    78

res.iloc[2,3]  # 不管是行仍是列 索引都是從0開始的  【78在表格中的2行3列的位置】

78

res.loc[["A","C"],"c"]  # 行數據取了A/C兩行得數據,列取得c列的數據

A    92
C    85
Name: c, dtype: int32

DataFrame 切片

【注意】
直接用中括號時：
- 索引表示的是列索引
- 切片表示的是行切片

res = pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","d"])
res


      a    b     c     d
A    64    60    82    97
B    64    74    63    90
C    88    68    60    71

res[1:]   # 切片 表示的是行切片

        a    b    c    d
B    99    72    91    72
C    83    61    71    98    

res["c"]  # 索引表示的是列索引

A    82
B    63
C    60
Name: c, dtype: int32

在loc和iloc中使用切片(切列) ： df.loc['B':'C','丙':'丁']設計

res.iloc[1,1:3]  # 取第二行,b-c列的數據    顧頭不顧尾
b    74
c    63
Name: B, dtype: int32

res.iloc[:,1:3]  # 取全部行,b-c列數據
      b    c
A    60    82
B    74    63
C    68    60

res.loc["A":"C","b":"c"]   # 取A-C行  b-c列數據
      b    c
A    60    82
B    74    63
C    68    60

DataFrame的運算

DataFrame之間的運算

同Series同樣：

- 在運算中自動對齊不一樣索引的數據
- 若是索引不對應，則補NaN

res = pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","d"])
ret = pd.DataFrame(np.random.randint(60,100,size=(3,4)),index=["A","B","C"],columns=["a","b","c","f"])
res + ret


      a      b      c      d      f
A    138    174    173    NaN    NaN
B    142    168    180    NaN    NaN
C    160    156    187    NaN    NaN

1. pandas入門之DataFrame
2. pandas入門02---DataFrame
3. Pandas基礎入門（3）DataFrame
4. Pandas庫之DataFrame
5. Pandas 庫之 DataFrame
6. Pandas之DataFrame
7. pandas之DataFrame
8. Pandas之DataFrame操做
9. pandas之DataFrame繪圖
10. Pandas之DataFrame——Part 2
更多相關文章...
• Memcached入門教程 - NoSQL教程
• Neo4j數據庫入門教程 - NoSQL教程
• YAML 入門教程
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。