第十一課: - 合併來自各類來源的數據

第 11 課


從多個Excel文件中抓取數據並將它們合併到一個數據幀中。html

In [1]:
import pandas as pd import matplotlib import os import sys %matplotlib inline 
In [2]:
print('Python version ' + sys.version) print('Pandas version ' + pd.__version__) print('Matplotlib version ' + matplotlib.__version__) 
Python version 3.5.1 |Anaconda custom (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)]
Pandas version 0.20.1
Matplotlib version 1.5.1
 

建立3個excel文件

In [3]:
# Create DataFrame
d = {'Channel':[1], 'Number':[255]} df = pd.DataFrame(d) df 
Out[3]:
  Channel Number
0 1 255
In [4]:
# Export to Excel

df.to_excel('test1.xlsx', sheet_name = 'test1', index = False) df.to_excel('test2.xlsx', sheet_name = 'test2', index = False) df.to_excel('test3.xlsx', sheet_name = 'test3', index = False) print('Done') 
Done 

將全部三個Excel文件放入一個DataFrame中 

獲取文件名列表,確保文件夾中沒有其餘Excel文件。python

In [5]:
# List to hold file names
FileNames = [] # Your path will be different, please modify the path below. os.chdir(r"C:\Users\david\notebooks\update") # Find any file that ends with ".xlsx" for files in os.listdir("."): if files.endswith(".xlsx"): FileNames.append(files) FileNames 
Out[5]:
['test1.xlsx', 'test2.xlsx', 'test3.xlsx'] 

建立一個函數來處理全部的excel文件。app

In [6]:
def GetFile(fnombre): # Path to excel file # Your path will be different, please modify the path below. location = r'C:\Users\david\notebooks\update\\' + fnombre # Parse the excel file # 0 = first sheet df = pd.read_excel(location, 0) # Tag record to file name df['File'] = fnombre # Make the "File" column the index of the df return df.set_index(['File']) 
 遍歷每一個文件名,建立一個數據幀,並將其添加到列表中。

i.e.
df_list = [df, df, df]函數

In [7]:
# Create a list of dataframes
df_list = [GetFile(fname) for fname in FileNames] df_list 
Out[7]:
[            Channel  Number
 File                       
 test1.xlsx        1     255,             Channel  Number
 File                       
 test2.xlsx        1     255,             Channel  Number
 File                       
 test3.xlsx        1     255]
In [8]:
# Combine all of the dataframes into one
big_df = pd.concat(df_list) big_df 
Out[8]:
  Channel Number
File    
test1.xlsx 1 255
test2.xlsx 1 255
test3.xlsx 1 255
In [9]:
big_df.dtypes 
Out[9]:
Channel    int64
Number     int64
dtype: object
In [10]:
# Plot it!
big_df['Channel'].plot.bar(); 
 
 

This tutorial was rewrited by CDSspa

相關文章
相關標籤/搜索