python 讀寫excel

最近老大讓從網站上獲取數據,手動太慢,網上找了點python,用腳本操做。html

 1 import os
 2 import re
 3 
 4 import xlrd
 5 import requests
 6 import xlwt
 7 from bs4 import BeautifulSoup
 8 from xlutils.copy import copy
 9 from xlwt import *
10 
11 
12 def read_excel(path):
13     # 打開文件
14     workbook = xlrd.open_workbook(path)
15     # 獲取全部sheet
16 
17     # 根據sheet索引或者名稱獲取sheet內容
18     sheet1 = workbook.sheet_by_index(0)  # sheet索引從0開始
19 
20     # sheet的名稱,行數,列數
21     i = 0
22     for sheet1_values in sheet1._cell_values:
23 
24         str = sheet1_values[0]
25         str.replace('\'','')
26         print (str,i)
27         response = get_responseHtml(str)
28         soup = get_beautifulSoup(response)
29         pattern1 = '^https://ews-aln-core.cisco.com/applmgmt/view-appl/+[0-9]*$'
30         pattern2 = '^https://ews-aln-core.cisco.com/applmgmt/view-endpoint/+[0-9]*$'
31         pattern3 = '^https://ews-aln-core.cisco.com/applmgmt/view-appl/by-name/'
32         if pattern_match(str,pattern1) or pattern_match(str,pattern3):
33             priority = soup.find("table", class_="main_table_layout").find("tr", class_="centered sub_section_header").find_next("tr",
34                                                                                                                   align="center").find_all(
35             "td")
36         elif pattern_match(str,pattern2):
37             priority = soup.find("table", class_="main_table_layout").find("tr",
38                                                                            class_="centered").find_next(
39                 "tr",
40                 align="center").find_all(
41                 "td")
42         else:
43             print("no pattern")
44         try:
45             priorityNumble ='P' + get_last_td(priority)
46 
47         except Exception:
48             print("沒有找到"+str)
49             priorityNumble = 'P' + get_last_td(priority)
50         write_excel(path,i,1,priorityNumble)
51         i = i + 1
52 def write_excel(path,row,col,value):
53     oldwb = xlrd.open_workbook(path)
54     wb =copy(oldwb)
55     ws = wb.get_sheet(0)
56     ws.write(row,col,value)
57     wb.save(path)
58 def get_last_td(result):
59     for idx  in range(len(result)):
60         returnResult = result[idx].contents[0]
61     return returnResult
62 def get_beautifulSoup(request):
63     soup = BeautifulSoup(request, 'html.parser', from_encoding='utf-8', exclude_encodings='utf-8')
64     return soup
65 def get_responseHtml(url):
66     headers = {
67         'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
68     response = requests.get(url, auth=(userName, passWord),headers=headers).content
69     return response
70 def pattern_match(str,pattern,flags = 0):
71     pattern = re.compile(pattern)
72     return re.match(pattern,str,flags)
73 
74 if __name__ == '__main__':
75     userName = '*';
76     passWord = '*'
77     path = r'*'
78     read_excel(path)

這裏面坑但是很多python

  1.剛開始xlsx格式文件,save後不能打開,把excel格式改成xls才正確。正則表達式

  2.header網上找的,這樣不會被認爲是網絡爬蟲而報錯:http.client.RemoteDisconnected: Remote end closed connection without response.express

  3.copy的參數要爲workbook而不是xls的fileName,不然報錯:AttributeError: ‘str’ object has no attribute ‘datemode’.網絡

  4.找到一篇很好的博客:Python中,添加寫入數據到已經存在的Excel的xls文件,即打開excel文件,寫入新數據app

  5.剛開始想往新的文件裏save,用了新的路徑,發現不可行,由於在for循環中每次都是從源excel中copy,因此實際結果只插入了一行。網站

  6.正則表達式的語法:正則表達式 - 語法  和 Python正則表達式url

  6.python中beautiful soup的用法,很全的文檔:Beautiful Soup 4.2.0 文檔spa

  7.一個爬小說的demo:Python3網絡爬蟲(七):使用Beautiful Soup爬取小說.net

  8.從沒寫過python,第一次寫,花了半天時間,還有不少能夠改進的地方。

相關文章
相關標籤/搜索