使用python處理文件

想作一些簡單的文件操做,用java過重量級,python是一個不錯的選擇。java

因爲本地資源有限,部署到阿里雲服務器上是更好的選擇。python

有一個需求是將一個文件夾中全部的文件的內容提取出來分別填入excel的一個單元格中,git

用os就能夠對文件進行遍歷,讀文件信息github

import os
# Get the all files & directories in the specified directory (path).  
def get_recursive_file_list(path):  
    current_files = os.listdir(path)  
    all_files = []  
    for file_name in current_files:  
        full_file_name = os.path.join(path, file_name)  
        all_files.append(full_file_name)  
  
        if os.path.isdir(full_file_name):  
            next_level_files = get_recursive_file_list(full_file_name)  
            all_files.extend(next_level_files)  
  
    return all_files  

all_files=get_recursive_file_list('C:\Users\green_pasture\Desktop\korea\key_words')
for filename in all_files:
	print filename
	f1=open(filename,'r+')
	for line1 in f1:
	  print "\n"
    	  print line1,
    f1.close

  可是遇到一個問題,IndentationError:unindent does not match any outer indentation level,因而去查找了一下python的indentation:服務器

http://www.secnetix.de/olli/Python/block_indentation.hawkapp

「關於python縮進的迷思」中說道:只有縮進層(即語句最左邊)的空格是有意義的,而且跟縮進的確切數目無關,只和代碼塊的相對縮進有關。ide

同時,在你使用顯式或者隱式的continue line時縮進會被忽略。測試

你能夠把內層的代碼同時寫在一行,用分號隔開。若是要將他們寫到不一樣的行,那麼python會強制你使用它的indentation規則。在python中,縮進的層次和代碼的邏輯結構是一致的。阿里雲

不要把tab和space混在一塊兒,一般,tab能夠自動用8個空格來代替。spa

import os
# Get the all files & directories in the specified directory (path).  
def get_recursive_file_list(path):  
    current_files = os.listdir(path)  
    all_files = []  
    for file_name in current_files:  
        full_file_name = os.path.join(path, file_name)  
        all_files.append(full_file_name)  
  
        if os.path.isdir(full_file_name):  
            next_level_files = get_recursive_file_list(full_file_name)  
            all_files.extend(next_level_files)  
  
    return all_files  

all_files=get_recursive_file_list('C:\Users\green_pasture\Desktop\korea\key_words')
for filename in all_files:
	print filename
	f1=open(filename,'r+')
	for line1 in f1:    	print line1,

  我將for裏面的語句跟for寫在了同一行,程序沒有錯誤了。

接着我要把打印出來的文件內容寫入到excel單元格當中。

能夠使用xlsxWriter https://xlsxwriter.readthedocs.org/

 xlwt http://www.python-excel.org/

openpyxl http://pythonhosted.org/openpyxl/

xlsxWriter文檔挺全,就考慮用這個

samplecode是:

##############################################################################
#
# A simple example of some of the features of the XlsxWriter Python module.
#
# Copyright 2013-2014, John McNamara, jmcnamara@cpan.org
#
import xlsxwriter


# Create an new Excel file and add a worksheet.
workbook = xlsxwriter.Workbook('demo.xlsx')
worksheet = workbook.add_worksheet()

# Widen the first column to make the text clearer.
worksheet.set_column('A:A', 20)

# Add a bold format to use to highlight cells.
bold = workbook.add_format({'bold': True})

# Write some simple text.
worksheet.write('A1', 'Hello')

# Text with formatting.
worksheet.write('A2', 'World', bold)

# Write some numbers, with row/column notation.
worksheet.write(2, 0, 123)
worksheet.write(3, 0, 123.456)

# Insert an image.
worksheet.insert_image('B5', 'logo.png')

workbook.close()

  如何安裝呢?能夠使用pip installer,個人python目錄下C:\Python33\Scripts已經有pip.exe,把當前路徑設置到path環境變量,在命令行執行pip install XlsxWriter

出現了錯誤:

Fatal error in launcher: Unable to create process using C:\Python33\Scripts\pip.exe install XlsxWriter。

改用從github下載,

$ git clone https://github.com/jmcnamara/XlsxWriter.git

$ cd XlsxWriter
$ sudo python setup.py install

  建立了一個測試程序:

import xlsxwriter

workbook = xlsxwriter.Workbook('hello.xlsx')
worksheet = workbook.add_worksheet()

worksheet.write('A1', 'Hello world')

workbook.close()

  測試成功。

import os
import xlsxwriter
# Get the all files & directories in the specified directory (path).  
def get_recursive_file_list(path):  
    current_files = os.listdir(path)  
    all_files = []  
    for file_name in current_files:  
        full_file_name = os.path.join(path, file_name)  
        all_files.append(full_file_name)  
  
        if os.path.isdir(full_file_name):  
            next_level_files = get_recursive_file_list(full_file_name)  
            all_files.extend(next_level_files)  
  
    return all_files  

workbook = xlsxwriter.Workbook('keywords.xlsx')
worksheet = workbook.add_worksheet()

all_files=get_recursive_file_list('C:\Users\green_pasture\Desktop\korea\key_words')
row=0
for filename in all_files:
	print filename
	f1=open(filename,'r+')
	keywords=""
	list=[]
	a="\n"
	for line in f1:    	list.append(line),
	keywords=a.join(list)
	print keywords
	worksheet.write(row,0, filename)
	worksheet.write(row,1,keywords.decode("utf-8"))
	row=row+1
workbook.close()
相關文章
相關標籤/搜索