pandas模塊

時間 2019-11-11

標籤 pandas 模塊简体版

原文原文鏈接

目錄html

pandas模塊簡介

pandas是python數據分析的核心模塊。它主要提供了五大功能:python

支持文件存取操做，支持數據庫(sql)、html、json、pickle、csv(txt、excel)、sas、stata、hdf等。
支持增刪改查、切片、高階函數、分組聚合等單表操做，以及和dict、list的互相轉換。
支持多表拼接合並操做。
支持簡單的繪圖操做。
支持簡單的統計分析操做。

Series數據結構

Series是一種相似於一維數組的對象，由一組數據和一組與之相關的數據標籤（索引）組成。正則表達式

Series比較像列表（數組）和字典的結合體sql

import numpy as np
import pandas as pd

Series建立方式

第一種: 直接傳入一個列表,此時因爲沒有指定數據索引,則會自動建立一個0~N-1(N爲數據的長度)的整型索引,能夠經過索引進行取值

df = pd.Series([i for i in range(4, 8)])
df

0    4
1    5
2    6
3    7
dtype: int64

df[1]

df.values

array([4, 5, 6, 7], dtype=int64)

第二種:傳入一個列表,自定義索引列表(索引列表長度須要和數據的長度一致),此時就能夠經過自定義的索引進行取值, 但仍是能夠經過默認索引進行取值

df1 = pd.Series([2,3,5,7,9], index=['a','c', 'b','e','f'])
df1

a    2
c    3
b    5
e    7
f    9
dtype: int64

df1[2]

df1['b']

df1.index

Index(['a', 'c', 'b', 'e', 'f'], dtype='object')

第三種: 傳入一個字典,至關於第二種方式

df2 = pd.Series({'b': 2, 'f': 5})
df2

b    2
f    5
dtype: int64

df2[0]

df2['f']

第四種: 建立一個值都是0的數組

pd.Series(0, index=['a','b'])

a    0
b    0
dtype: int64

Series缺失數據處理

方法	詳解
dropna()	過濾掉值爲NaN的行
fillna()	填充缺失數據
isnull()	返回布爾數組，缺失值對應爲True
notnull()	返回布爾數組，缺失值對應爲False

df = pd.Series([1, 2, 3, 4, np.nan], index=['a', 'b', 'c', 'd', 'e'])
print(df)

a    1.0
b    2.0
c    3.0
d    4.0
e    NaN
dtype: float64

print(df.dropna())  # 不會改變原先的數組

a    1.0
b    2.0
c    3.0
d    4.0
dtype: float64

df1 = df.copy()
df1.dropna(inplace=True)  # inplace參數默認爲False,當設爲True的時候,則會改變原先的數組
df1

a    1.0
b    2.0
c    3.0
d    4.0
dtype: float64

df.fillna(0)

a    1.0
b    2.0
c    3.0
d    4.0
e    0.0
dtype: float64

df.isna()

a    False
b    False
c    False
d    False
e     True
dtype: bool

df.notnull()

a     True
b     True
c     True
d     True
e    False
dtype: bool

Series特性

從ndarray建立Series:Series(arr)
與標量（數字）：sr * 2
兩個Series運算
通用函數：np.ads(sr)
布爾值過濾：sr[sr>0]
統計函數：mean()、sum()、cumsum()

支持字典的特性

從字典建立Series：Series(dic),
In運算：'a'in sr、for x in sr
鍵索引：sr['a'],sr[['a','b','d']]
鍵切片：sr['a':'c']
其餘函數：get('a',default=0)等

整數索引

pandas當中的整數索引對象可能會讓初次接觸它的人很懵逼，
接下來經過代碼演示：數據庫

df = pd.Series(np.arange(10))
df1 = df[3:].copy()
df1

3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32

df1[1]  # 報錯, 由於pandas當中使用整數索引取值是優先以標籤解釋的，而不是下標

此時有以下解決辦法:json

loc屬性: 以標籤解釋
iloc屬性: 如下標解釋

df1.loc[3]

df1.iloc[0]

Series數據對齊

pandas在運算時，會按索引進行對齊而後計算。若是存在不一樣的索引，則結果的索引是兩個操做數索引的並集。數組

sr1 = pd.Series([12,23,34], index=['c','a','d'])
sr1

c    12
a    23
d    34
dtype: int64

sr2 = pd.Series([11,20,10], index=['d','c','a',])
sr2

d    11
c    20
a    10
dtype: int64

sr1 + sr2  # 能夠經過這種索引對齊直接將兩個Series對象進行運算

a    33
c    32
d    45
dtype: int64

sr3 = pd.Series([11,20,10,14], index=['d','c','a','b'])
sr3

d    11
c    20
a    10
b    14
dtype: int64

sr1 + sr3  #  sr1 和 sr3的索引不一致，因此最終的運行會發現b索引對應的值沒法運算，就返回了NaN,一個缺失值

a    33.0
b     NaN
c    32.0
d    45.0
dtype: float64

此時能夠將兩個Series對象相加時將缺失值設爲0數據結構

sr1.add(sr3, fill_value=0)# 將缺失值設爲0，因此最後算出來b索引對應的結果爲14

a    33.0
b    14.0
c    32.0
d    45.0
dtype: float64

DataFram

DataFrame是一個表格型的數據結構，含有一組有序的列。
DataFrame能夠被看作是由Series組成的字典，而且共用一個索引。app

DataFram建立方式

利用包含等長度列表或Numpy數組的字典來造成DataFrame

pd.DataFrame({'a': pd.Series([1,2,3,4]),'b':pd.Series([5,6,7,8])})

	a	b
0	1	5
1	2	6
2	3	7
3	4	8

經過columns參數指定列參數

pd.DataFrame(np.random.randint(1,100,(3,4),dtype=int), columns=['c1','c2','c3','c4'])

	c1	c2	c3	c4
0	43	72	62	87
1	77	91	42	98
2	22	76	62	63

經過index指定行參數

df = pd.DataFrame(np.random.randint(1,100,(3,4),dtype=int),index=['one', 'tow', 'three'] ,columns=['c1','c3','c2','c4'])
df

	c1	c3	c2	c4
one	37	85	3	22
tow	45	7	36	49
three	62	15	28	79

DataFram經常使用屬性和方法

屬性/方法	做用
dtypes	查看數據類型
index	獲取行索引
columns	獲取列索引
transpose	轉置,也可用Ｔ來操做
values	獲取值索引
describe()	獲取快速統計
sort_index(axis)	排序，可按行(axis=0)或列(axis=1)index排序輸出
sort_values(by)	按數據值來排序

df.dtypes

c1    int32
c3    int32
c2    int32
c4    int32
dtype: object

df.index

Index(['one', 'tow', 'three'], dtype='object')

df.values

array([[24, 51, 24,  6],
       [32,  9, 44, 57],
       [ 3, 27,  1, 84]])

df.columns

Index(['c1', 'c3', 'c2', 'c4'], dtype='object')

df.T

	one	tow	three
c1	24	32	3
c3	51	9	27
c2	24	44	1
c4	6	57	84

df.describe()

	c1	c3	c2	c4
count	3.000000	3.000000	3.000000	3.000000
mean	19.666667	29.000000	23.000000	49.000000
std	14.977761	21.071308	21.517435	39.610605
min	3.000000	9.000000	1.000000	6.000000
25%	13.500000	18.000000	12.500000	31.500000
50%	24.000000	27.000000	24.000000	57.000000
75%	28.000000	39.000000	34.000000	70.500000
max	32.000000	51.000000	44.000000	84.000000

df.sort_index(axis=0)  # 按照行索引進行排序

	c1	c3	c2	c4
one	24	51	24	6
three	3	27	1	84
tow	32	9	44	57

df.sort_index(axis=1) # 按照列索引進行排序

	c1	c2	c3	c4
one	24	24	51	6
tow	32	44	9	57
three	3	1	27	84

df.sort_values(by='c3', ascending=False)  # 默認按照列的值進行排序

	c1	c3	c2	c4
one	24	51	24	6
three	3	27	1	84
tow	32	9	44	57

df.sort_values(by='three',axis=1) # 按照行的值進行排序

	c2	c1	c3	c4
one	24	24	51	6
tow	44	32	9	57
three	1	3	27	84

DataFram的索引和切片

經過columns取值

df

	c1	c3	c2	c4
one	24	51	24	6
tow	32	9	44	57
three	3	27	1	84

df[['c1','c3']]  # 取多列數據須要傳一個列表類型

	c1	c3
one	24	51
tow	32	9
three	3	27

loc經過行標籤進行取值

經過自定義的行標籤進行取值

df.loc[:'one']

	c1	c3	c2	c4
one	24	51	24	6

經過索引取值

df[1:3]

	c1	c3	c2	c4
tow	32	9	44	57
three	3	27	1	84

iloc(相似於numpy數組取值)

df

	c1	c3	c2	c4
one	24	51	24	6
tow	32	9	44	57
three	3	27	1	84

df.iloc[0, 0]

df.iloc[:2, 1:4]

	c3	c2	c4
one	51	24	6
tow	9	44	57

使用邏輯判斷進行取值

df[df['c1'] >20]

	c1	c3	c2	c4
one	24	51	24	6
tow	32	9	44	57

df[(df['c1'] > 20) & (df['c2'] <30)]

	c1	c3	c2	c4
one	24	51	24	6

DataFram值替換

df

	c1	c3	c2	c4
one	37	85	3	22
tow	45	7	36	49
three	62	15	28	79

df1 = df.copy()
df1[df1 < 20] = 100
df1

	c1	c3	c2	c4
one	37	85	100	22
tow	45	100	36	49
three	62	100	28	79

產生時間對象數組: date_range

參數:dom

參數	詳解
start	開始時間
end	結束時間
periods	時間長度
freq	時間頻率，默認爲'D'，可選H(our),W(eek),B(usiness),S(emi-)M(onth),(min)T(es), S(econd), A(year),…

dates = pd.date_range('20190101', periods=3, freq='M')
dates

DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31'], dtype='datetime64[ns]', freq='M')

df.index = dates
df

	c1	c3	c2	c4
2019-01-31	37	85	3	22
2019-02-28	45	7	36	49
2019-03-31	62	15	28	79

數據分組和聚合

在數據分析當中，咱們有時須要將數據拆分，而後在每個特定的組裏進行運算，這些操做一般也是數據分析工做中的重要環節。

分組(groupby)

pandas對象（不管Series、DataFrame仍是其餘的什麼）當中的數據會根據提供的一個或者多個鍵被拆分爲多組，拆分操做是在對象的特定軸上執行的。就好比DataFrame能夠在他的行上或者列上進行分組，而後將一個函數應用到各個分組上併產生一個新的值。最後將全部的執行結果合併到最終的結果對象中。

分組鍵的形式:

列表或者數組，長度與待分組的軸同樣
表示DataFrame某個列名的值。
字典或Series，給出待分組軸上的值與分組名之間的對應關係
函數，用於處理軸索引或者索引中的各個標籤嗎

後三種只是快捷方式，最終仍然是爲了產生一組用於拆分對象的值。
首先，經過一個很簡單的DataFrame數組嘗試一下：

df = pd.DataFrame({'key1':['x','x','y','y','x'],                               
            'key2':['one','two','one','two','one'],
            'data1':np.random.randn(5),
            'data2':np.random.randn(5)})
df

	key1	key2	data1	data2
0	x	one	0.506897	-1.189281
1	x	two	-1.448441	-1.658427
2	y	one	-0.665272	-1.708576
3	y	two	-1.466032	-1.705750
4	x	one	3.127327	-1.591700

# 訪問data1，並根據key1調用groupby：
f1 = df['data1'].groupby(df['key1'])
f1.groups

{'x': Int64Index([0, 1, 4], dtype='int64'),
 'y': Int64Index([2, 3], dtype='int64')}

上述運行是沒有進行任何計算的，可是咱們想要的中間數據已經拿到了，接下來，就能夠調用groupby進行任何計算

# 調用mean函數求出平均值
f1.mean()

key1
x    0.728594
y   -1.065652
Name: data1, dtype: float64

以上數據通過分組鍵（一個Series數組）進行了聚合，產生了一個新的Series，索引就是key1列中的惟一值。這些索引的名稱就爲key1。接下來就嘗試一次將多個數組的列表傳進來

f2 = df['data1'].groupby([df['key1'],df['key2']])
f2.mean()

key1  key2
x     one     1.817112
      two    -1.448441
y     one    -0.665272
      two    -1.466032
Name: data1, dtype: float64

傳入多個數據以後會發現，獲得的數據具備一個層次化的索引，key1對應的x\y;key2對應的one\two.

# 經過unstack方法就可讓索引不堆疊在一塊兒了

f2.mean().unstack()

key2	one	two
key1
x	1.817112	-1.448441
y	-0.665272	-1.466032

補充:

分組鍵能夠是任意長度的數組
分組時，對於不是數組數據的列會從結果中排除，例如key一、key2這樣的列
GroupBy的size方法，返回一個含有分組大小的Series

f1.size()

key1
x    3
y    2
Name: data1, dtype: int64

聚合

聚合是指任何可以從數組產生標量值的數據轉換過程。剛纔上面的操做會發現使用GroupBy並不會直接獲得一個顯性的結果，而是一箇中間數據，能夠經過執行相似mean、count、min等計算得出結果，常見的還有一些:

函數名	描述
sum	非NA值的和
median	非NA值的算術中位數
std、var	無偏（分母爲n-1）標準差和方差
prod	非NA值的積
first、last	第一個和最後一個非NA值

自定義聚合函數

不只可使用這些經常使用的聚合運算，還能夠本身自定義。使用自定義的聚合函數，須要將其傳入aggregate或者agg方法當中

def peak_to_peak(arr):
    return arr.max() - arr.min()

f1.aggregate(peak_to_peak)

key1
x    4.575767
y    0.800759
Name: data1, dtype: float64

f1.agg(['mean','std'])

	mean	std
key1
x	0.728594	2.295926
y	-1.065652	0.566222

最終獲得的列就會以相應的函數命名生成一個DataFrame數組

apply

GroupBy當中自由度最高的方法就是apply，它會將待處理的對象拆分爲多個片斷，而後各個片斷分別調用傳入的函數，最後將它們組合到一塊兒。

df.apply( ['func', 'axis=0', 'broadcast=None', 'raw=False', 'reduce=None', 'result_type=None', 'args=()', '**kwds'])

func:傳入一個自定義函數
axis:函數傳入參數當axis=1就會把一行數據做爲Series的數據

案例:

url="https://baike.baidu.com/item/NBA%E6%80%BB%E5%86%A0%E5%86%9B/2173192?fr=aladdin"
nba_champions=pd.read_html(url)  # 獲取數據
a1 = nba_champions[0]    # 取出決賽名單

a1.columns = a1.loc[0]  # 使用第一行的數據替換默認的橫向索引
a1.drop(0,inplace=True)  # 將第一行的數據刪除
a1.head()

	年份	比賽日期	冠軍	總比分	亞軍	FMVP
1	1947	4.16-4.22	費城勇士隊	4-1	芝加哥牡鹿隊	無
2	1948	4.10-4.21	巴爾的摩子彈隊	4-2	費城勇士隊	無
3	1949	4.4-4.13	明尼阿波利斯湖人隊	4-2	華盛頓國會隊	無
4	1950	4.8-4.23	明尼阿波利斯湖人隊	4-2	塞拉庫斯民族隊	無
5	1951	4.7-4.21	羅切斯特皇家隊	4-3	紐約尼克斯隊	無

# 取各個球隊獲取總冠軍的次數的前10名

a1.groupby('冠軍').size().sort_values(ascending=False).head(10)

冠軍
波士頓凱爾特人隊     17
洛杉磯湖人隊       11
芝加哥公牛隊        6
聖安東尼奧馬刺隊      5
明尼阿波利斯湖人隊     5
金州勇士隊         4
邁阿密熱火隊        3
底特律活塞隊        3
休斯頓火箭隊        2
紐約尼克斯隊        2
dtype: int64

其餘經常使用方法

pandas經常使用方法（適用Series和DataFrame）

mean(axis=0,skipna=False)
sum(axis=1)
sort_index(axis, …, ascending) # 按行或列索引排序
sort_values(by, axis, ascending) # 按值排序
apply(func, axis=0) # 將自定義函數應用在各行或者各列上，func可返回標量或者Series
applymap(func) # 將函數應用在DataFrame各個元素上
map(func) # 將函數應用在Series各個元素上

合併數據

pd.concat: 合併數據,合併行(axis=1),合併列(axis=0)
obj.append: 只能合併列

df1 = pd.DataFrame(np.zeros((3, 4)))
df1

	0	1	2	3
0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0

df2 = df2 = pd.DataFrame(np.ones((3, 4)))
df2

	0	1	2	3
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

pd.concat((df1, df2),axis=1)

	0	1	2	3
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

df1.append(df2)

	0	1	2	3
0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

導入導出數據

使用df = pd.read_excel(filename)讀取文件，使用df.to_excel(filename)保存文件。

讀取文件導入數據

讀取文件導入數據函數主要參數：

參數	詳解
sep	指定分隔符，可用正則表達式如'\s+'
header=None	指定文件無行名
name	指定列名
index_col	指定某列做爲索引
skip_row	指定跳過某些行
na_values	指定某些字符串表示缺失值
parse_dates	指定某些列是否被解析爲日期，布爾值或列表

寫入文件導出數據

寫入文件函數的主要參數：

參數	詳解
sep	分隔符
na_rep	指定缺失值轉換的字符串，默認爲空字符串
header=False	不保存列名
index=False	不保存行索引
cols	指定輸出的列，傳入列表

讀取CSV文件並處理數據

from io import StringIO

test_data = '''
5.1,,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,,0.2
7.0,3.2,4.7,1.4
6.4,3.2,4.5,1.5
6.9,3.1,4.9,
,,,
'''

test_data = StringIO(test_data)
df = pd.read_csv(test_data, header=None)
df.columns = ['A', 'B', 'C', 'D']
df.index = [1,2,3,4,5,6,7]
df

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN
7	NaN	NaN	NaN	NaN

處理丟失數據

# 經過在isnull()方法後使用sum()方法便可得到該數據集某個特徵含有多少個缺失值

df.isnull().sum()

A    1
B    2
C    2
D    2
dtype: int64

# axis=0刪除有NaN值的行

df.dropna(axis=0)

	A	B	C	D
2	4.9	3.0	1.4	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5

# 刪除全爲NaN值得行或列
df.dropna(how='all')

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

# 刪除行不爲4個值的
df.dropna(thresh=4)

	A	B	C	D
2	4.9	3.0	1.4	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5

# 刪除B中有NaN值的行
df.dropna(subset=['B'])

	A	B	C	D
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

# 填充nan值
df.fillna(value=0)

	A	B	C	D
1	5.1	0.0	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	0.0	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	0.0
7	0.0	0.0	0.0	0.0

讀取json文件

strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000},\
{"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000},\
{"ttery":"min","issue":"20130801-3389","code":"5,9,1,2,9","code1":"298329129","code2":null,"time":1013395346000},\
{"ttery":"min","issue":"20130801-3388","code":"3,8,7,3,3","code1":"298588733","code2":null,"time":1013395286000},\
{"ttery":"min","issue":"20130801-3387","code":"0,8,5,2,7","code1":"298818527","code2":null,"time":1013395226000}]'

df = pd.read_json(strtext, orient='records')
df

	code	code1	code2	issue	time	ttery
0	8,4,5,2,9	297734529	NaN	20130801-3391	1013395466000	min
1	7,8,2,1,2	298058212	NaN	20130801-3390	1013395406000	min
2	5,9,1,2,9	298329129	NaN	20130801-3389	1013395346000	min
3	3,8,7,3,3	298588733	NaN	20130801-3388	1013395286000	min
4	0,8,5,2,7	298818527	NaN	20130801-3387	1013395226000	min

orient參數的五種形式

orient是代表預期的json字符串格式。orient的設置有如下五個值：

'split' : dict like {index -> [index], columns -> [columns], data -> [values]}

這種就是有索引，有列字段,和數據矩陣構成的json格式。key名稱只能是index,columns和data。

s = '{"index":[1,2,3],"columns":["a","b"],"data":[[1,3],[2,8],[3,9]]}'
df = pd.read_json(s, orient='split')
df

	a	b
1	1	3
2	2	8
3	3	9

'records' : list like [{column -> value}, ... , {column -> value}]

這種就是成員爲字典的列表。如我今天要處理的json數據示例所見。構成是列字段爲鍵,值爲鍵值,每個字典成員就構成了dataframe的一行數據。

strtext = '[{"ttery":"min","issue":"20130801-3391","code":"8,4,5,2,9","code1":"297734529","code2":null,"time":1013395466000},\
{"ttery":"min","issue":"20130801-3390","code":"7,8,2,1,2","code1":"298058212","code2":null,"time":1013395406000}]'

df = pd.read_json(strtext, orient='records')
df

	code	code1	code2	issue	time	ttery
0	8,4,5,2,9	297734529	NaN	20130801-3391	1013395466000	min
1	7,8,2,1,2	298058212	NaN	20130801-3390	1013395406000	min

'index' : dict like {index -> {column -> value}}

以索引爲key,以列字段構成的字典爲鍵值。如：

s = '{"0":{"a":1,"b":2},"1":{"a":9,"b":11}}'
df = pd.read_json(s, orient='index')
df

	a	b
0	1	2
1	9	11

'columns' : dict like {column -> {index -> value}}

這種處理的就是以列爲鍵，對應一個值字典的對象。這個字典對象以索引爲鍵,以值爲鍵值構成的json字符串。以下圖所示:

s = '{"a":{"0":1,"1":9},"b":{"0":2,"1":11}}'
df = pd.read_json(s, orient='columns')
df

	a	b
0	1	2
1	9	11

'values' : just the values array。

values這種咱們就很常見了。就是一個嵌套的列表。裏面的成員也是列表，2層的。

s = '[["a",1],["b",2]]'
df = pd.read_json(s, orient='values')
df

	0	1
0	a	1
1	b	2

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

	0	1	2	3
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

	0	1	2	3
0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN
7	NaN	NaN	NaN	NaN

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

	A	B	C	D
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

	A	B	C	D
1	5.1	0.0	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	0.0	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	0.0
7	0.0	0.0	0.0	0.0

	0	1	2	3
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

	0	1	2	3
0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN
7	NaN	NaN	NaN	NaN

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

	A	B	C	D
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

	A	B	C	D
1	5.1	0.0	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	0.0	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	0.0
7	0.0	0.0	0.0	0.0

	0	1	2	3
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

	0	1	2	3
0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0
0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN
7	NaN	NaN	NaN	NaN

	A	B	C	D
1	5.1	NaN	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

	A	B	C	D
2	4.9	3.0	1.4	0.2
3	4.7	3.2	NaN	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	NaN

	A	B	C	D
1	5.1	0.0	1.4	0.2
2	4.9	3.0	1.4	0.2
3	4.7	3.2	0.0	0.2
4	7.0	3.2	4.7	1.4
5	6.4	3.2	4.5	1.5
6	6.9	3.1	4.9	0.0
7	0.0	0.0	0.0	0.0