經過瀏覽器檢查原代碼未發現提交信息,檢查元素,在XHR發現所須要的信息。html
找到包含所需信息的網址:https://edu.cnblogs.com/Homework/GetAnswers?homeworkId=2420&_=1543629375998,剩下就是代碼的問題了。python
https://edu.cnblogs.com/campus/hbu/Python2018Fall/homework/2420json
https://edu.cnblogs.com/Homework/GetAnswers?homeworkId=2420&_=1543629375998瀏覽器
不過 ,經過原網址和現網址對比,發現「2420」相同,遂猜測可經過網址最後編號獲取「博客園」全部做業的提取,經過代碼實踐,「https://edu.cnblogs.com/Homework/GetAnswers?homeworkId=2420」即可提取信息。網絡
至此便完成這次網絡爬蟲的全部工做。可是,最近正在學習python的圖形界面,遂設計了一個簡單的爬取界面。app
輸入博客園的做業鏈接,點擊開始爬取,即可以將爬取信息顯示在下方輸出窗口。學習
更有意思的即是隻要改最後四個數字,即可以爬取其餘的做業連接,上圖即是小小的實驗。ui
最後爬了下網絡爬蟲做業的信息。url
from PyQt5 import QtCore, QtGui, QtWidgets class Ui_Form(object): def setupUi(self, Form): Form.setObjectName("Form") Form.resize(1083, 667) self.label = QtWidgets.QLabel(Form) self.label.setGeometry(QtCore.QRect(110, 50, 91, 41)) font = QtGui.QFont() font.setPointSize(12) self.label.setFont(font) self.label.setObjectName("label") self.lineEdit = QtWidgets.QLineEdit(Form) self.lineEdit.setGeometry(QtCore.QRect(210, 60, 441, 31)) self.lineEdit.setObjectName("lineEdit") self.pushButton = QtWidgets.QPushButton(Form) self.pushButton.setGeometry(QtCore.QRect(650, 60, 91, 31)) font = QtGui.QFont() font.setPointSize(12) self.pushButton.setFont(font) self.pushButton.setObjectName("pushButton") self.textBrowser = QtWidgets.QTextBrowser(Form) self.textBrowser.setGeometry(QtCore.QRect(70, 110, 891, 501)) self.textBrowser.setObjectName("textBrowser") self.retranslateUi(Form) QtCore.QMetaObject.connectSlotsByName(Form) def retranslateUi(self, Form): _translate = QtCore.QCoreApplication.translate Form.setWindowTitle(_translate("Form", "Form")) self.label.setText(_translate("Form", "博客園連接:")) self.pushButton.setText(_translate("Form", "開始爬取")) from PyQt5 import QtWidgets from login import Ui_Form from PyQt5.QtWidgets import QFileDialog import requests import json class mywindow(QtWidgets.QWidget, Ui_Form): def __init__ (self): super(mywindow, self).__init__() self.setupUi(self) self.pushButton.clicked.connect(self.fun) def fun(self): u = self.lineEdit.text() u = u.split('/')[-1] url = "https://edu.cnblogs.com/Homework/GetAnswers?homeworkId={}".format(u) r = requests.get(url) r.encoding = r.apparent_encoding jd = json.loads(r.text)['data'] p = "" for i in jd: p += str(i['StudentNo']) + ',' + str(i['RealName']) + ',' + str(i['DateAdded']).replace('T', ' ').split('.')[0] + ',' + str(i['Title']) + ',' + str(i['Url'] + '\n') self.textBrowser.setText(p) f = open('D:\hwlist.csv', 'w') f.write(p) f.close() if __name__=="__main__": import sys from PyQt5.QtGui import QIcon app=QtWidgets.QApplication(sys.argv) ui = mywindow() ui.show() sys.exit(app.exec_())