Python操做csv文件

時間 2019-12-14

標籤 python csv 文件欄目 Python 简体版

原文原文鏈接

1.什麼是csv文件

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in RFC 4180. html

2.csv文件缺點

The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.python

3.python模塊csv.py

The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, 「write this data in the format preferred by Excel,」 or 「read data from this file which was generated by Excel,」 without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.app

the csv module’s reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter classesthis

reader(csvfile[, dialect='excel'][, fmtparam])

csvfile
        須要是支持迭代(Iterator)的對象，而且每次調用next方法的返回值是字符串(string)，一般的文件(file)對象，或者列表(list)對象都是適用的，若是是文件對象，打開是須要加"b"標誌參數。
dialect
        編碼風格，默認爲excel方式，也就是逗號(,)分隔，另外csv模塊也支持excel-tab風格，也就是製表符(tab)分隔。其它的方式須要本身定義，而後能夠調用register_dialect方法來註冊，以及list_dialects方法來查詢已註冊的全部編碼風格列表。
fmtparam
        格式化參數，用來覆蓋以前dialect對象指定的編碼風格。編碼

參數解釋：spa

delimiter：設置分隔符rest

quotechar：設置引用符excel

quoting：引號選項，有4種不一樣的引號選項code

在csv模塊中定義爲四個變量：orm

QUOTE_ALL不論類型是什麼，對全部字段都加引號。

QUOTE_MINIMAL對包含特殊字符的字段加引號（所謂特殊字符是指，對於一個用相同方言和選項配置的解析器，可能會形成混淆的字符）。這是默認選項。

QUOTE_NONNUMERIC對全部非整數或浮點數的字段加引號。在閱讀器中使用時，不加引號的輸入字段會轉換爲浮點數。

QUOTE_NONE輸出中全部內容都不加引號。在閱讀器中使用時，引號字符包含在字段值中（正常狀況下，它們會處理爲定界符並去除）。

import csv

def testReader(file):
	with open(file, 'r') as csvfile:
		spamreader = csv.reader(csvfile, delimiter=',')
		for row in spamreader:
			print(', '.join(row))

if __name__ == '__main__':
	csvFile = 'test.csv'
	testReader(csvFile)

writer(csvfile[, dialect='excel'][, fmtparam])

參數表(略: 同reader, 見上)

def testWriter(file):
	with open(file, 'w') as csvfile:
		spamwriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
		spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
		spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

DictReader（f，fieldnames = None，*restkey = None，restval = None，dialect ='excel'， args，** kwds ）

建立一個像常規閱讀器同樣操做的對象，但將每一行中的信息映射到一個OrderedDict 由可選的fieldnames參數給出的鍵。

字段名的參數是一個序列。若是省略字段名稱，文件f的第一行中的值將用做字段名稱。不管字段名稱如何肯定，有序字典保留其原始排序。

若是一行的字段數超過了字段名，剩下的數據將被放在一個列表中，並與restkey（默認爲None）指定的字段名一塊兒存儲。若是非空行的字段數少於字段名，則缺乏的值將被填入None。

def testDictReader(file):
	# 院系,專業,年級,學生類別,班級,學號,姓名,學分紅績,更新時間,班級排名,參與班級排名總人數
	with open(file, 'rb') as csvfile:
		dictreader = csv.DictReader(csvfile)
		for row in dictreader:
			print(' '.join([row['院系'], row['專業'], row['學號'], row['姓名']]))

DictWriter（f，fieldnames，*restval =「，extrasaction ='raise'，dialect ='excel'， args，** kwds ）

建立一個像普通writer同樣運行的對象，但將字典映射到輸出行上。的字段名的參數是一個sequence標識，其中在傳遞給字典值的順序按鍵的writerow()方法被寫入到文件 ˚F。可選的restval參數指定字典缺乏字段名中的鍵時要寫入的值。若是傳遞給該writerow()方法的字典包含在字段名稱中未找到的鍵，則可選的extrasaction參數指示要執行的操做。若是設置爲'raise'默認值，ValueError 則爲a 。若是設置爲'ignore'，字典中的額外值將被忽略。任何其餘可選或關鍵字參數都傳遞給底層 writer實例。

請注意，與DictReader類不一樣，fieldnames參數DictWriter不是可選的。因爲Python的dict 對象未被排序，所以沒有足夠的可用信息推導出行應該寫入文件f的順序。

def testDictWriter(file):
	with open(file, 'w') as csvfile:
		fieldnames = ['院系', '專業', '年級', '學生類別', '班級', '學號']
		writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
		writer.writeheader()
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101245'})
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101275'})

4.示例代碼

csv文件的拷貝

def copycsv(source, target):
	csvtarget = open(target, 'w+')
	with open(source, 'r') as csvscource:
		reader = csv.reader(csvscource, delimiter=',')
		for line in reader:
			writer = csv.writer(csvtarget, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
			writer.writerow(line)
	csvtarget.close()

5.其餘方式（numpy,pandas）

import numpy

	my_matrix = numpy.loadtxt(open("num.csv", "rb"), delimiter=",", skiprows=0)
	print(my_matrix)

import pandas as pd

obj=pd.read_csv('test.csv')
print obj
print type(obj)
print obj.dtypes

test.csv

院系,專業,年級,學生類別,班級,學號,姓名,學分紅績,更新時間,班級排名,參與班級排名總人數
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101244,欒,86.72,2017/9/5 9:59,1,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101237,劉,86.05,2017/9/5 9:59,2,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101233,劉,86.03,2017/9/5 9:59,3,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101250,李,85.43,2017/9/5 9:59,4,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101229,張,82.35,2017/9/5 9:59,5,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101241,韓,80.92,2017/9/5 9:59,6,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101232,丁,80.66,2017/9/5 9:59,7,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101228,張,79.61,2017/9/5 9:59,8,27
信息學院,計算機科學與技術,2011級,本科(本科)4年,計算機11,201101255,孟,79.55,2017/9/5 9:59,9,27

num.csv

1,2,3
4,5,6
7,8,9

6.完整代碼

# coding:utf-8

import csv


def testReader(file):
	with open(file, 'r') as csvfile:
		spamreader = csv.reader(csvfile, delimiter=',')
		for row in spamreader:
			print(', '.join(row))


def testWriter(file):
	with open(file, 'w') as csvfile:
		spamwriter = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
		spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
		spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])


def copycsv(source, target):
	csvtarget = open(target, 'w+')
	with open(source, 'r') as csvscource:
		reader = csv.reader(csvscource, delimiter=',')
		for line in reader:
			writer = csv.writer(csvtarget, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
			writer.writerow(line)
	csvtarget.close()


def testDictReader(file):
	# 院系,專業,年級,學生類別,班級,學號,姓名,學分紅績,更新時間,班級排名,參與班級排名總人數
	with open(file, 'rb') as csvfile:
		dictreader = csv.DictReader(csvfile)
		for row in dictreader:
			print(' '.join([row['院系'], row['專業'], row['學號'], row['姓名']]))


def testDictWriter(file):
	with open(file, 'w') as csvfile:
		fieldnames = ['院系', '專業', '年級', '學生類別', '班級', '學號']
		writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
		writer.writeheader()
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101245'})
		writer.writerow(
			{'院系': '信息學院', '專業': '計算機科學與技術', '年級': '2011級', '學生類別': '本科(本科)4年', '班級': '計算機11', '學號': '201101275'})


def testpandas_csv():
	import pandas as pd

	obj = pd.read_csv('test.csv')
	print obj
	print type(obj)
	print obj.dtypes


def testnumpy_csv():
	import numpy

	my_matrix = numpy.loadtxt(open("num.csv", "rb"), delimiter=",", skiprows=0)
	print(my_matrix)


if __name__ == '__main__':
	# csvFile = 'test.csv'
	# testReader(csvFile)

	# csvFile = 'test2.csv'
	# testWriter(csvFile)

	# copycsv('test.csv', 'testcopy.csv')

	# testDictReader('test.csv')

	# testDictWriter('test2.csv')
	testnumpy_csv()

# testpandas_csv()