背景:有20萬行的數據,須要按照1萬5行進行拆分;python
核心:藉助餘數拆分組,再取子數據集;mysql
示例:sql
步驟一取餘數

步驟二:取子數據集

步驟三:保存數據集

附上代碼df拆分紅小文件代碼:微信
df
chunk_size = 2 ## 設置切分大小
df['chunk'] = df.index // chunk_size
#df
list_chunk = df['chunk'].unique().tolist()
#list_chunk
table =df
#保存到多個excel文件中
for name in list_chunk:
path = '/Users/zhoujunqing/Downloads/EXCEL'
file_name = str(name) + 'file.xlsx'
file_path = path +'/'+file_name
print(file_path)
df_chunk = table[table['chunk'] == name]
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df_chunk.to_excel(writer,str(name),index=False)
writer.save()
補充讀取excel文件代碼:spa
import re
import pandas as pd
from datetime import datetime
import time
def read_xlsx(path,sheet_name):
xlsx_file = pd.ExcelFile(path) ##路徑
table = xlsx_file.parse(sheet_name) ##選取表
return table
if __name__ == "__main__":
start_time = time.time() # 開始時間
path = '/Users/xxx/Public'
path = '/Users/xxx/Downloads'
file_name ='test.xlsx'
#file_name ='報名記錄彙總v1.xlsx'
sheet_name_list = {
'hive':'動態id',
'mysql':'Sheet4',
'excel':'工做表4',
'xlsx':'Sheet1'
}
path = path+"/"+file_name
sheet_name = sheet_name_list['excel']
#sheet_name = sheet_name_list['email']
df = read_xlsx(path,sheet_name)
print(df.head())
end_time = time.time() #結束時間
print("程序耗時%f秒." % (end_time - start_time))
本文分享自微信公衆號 - SQL數據分析(dianwu_dw)。
若有侵權,請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」,歡迎正在閱讀的你也加入,一塊兒分享。.net