主題:爬取某網站的招聘信息,而後存進Sqlite數據庫。html
環境準備:python
Python3.5sql
Sqlite數據庫
Navicat for SQLite(方便查看)網站
步驟:編碼
1、 安裝Sqliteurl
下載地址:http://www.sqlite.org/download.htmlspa
這裏是window 10系統,因此找到Precompiled Binaries for Windows下的sqlite tools下載,解壓後將**sqlite3.exe**放到python的安裝目錄下既可命令行
2、初始化Sqlite數據庫設計
一、在d:\study\spyder目錄下新建job_model.py文件
1 from peewee import * 2 3 db = SqliteDatabase('job.db') 4 5 class Job(Model): 6 job_id = IntegerField(unique=True) #key 7 salary_min = IntegerField() 8 salary_max = IntegerField() 9 job_exp = CharField(max_length=100) 10 company = CharField(max_length=100) 11 company_id = IntegerField() 12 company_info = CharField(max_length=100) 13 url = CharField(max_length=100) 14 attract = CharField(max_length=100) 15 detail = TextField() 16 address = CharField(max_length=100) 17 publish_time = DateField() 18 keyword = CharField(max_length=100) 19 city = CharField(max_length=100) 20 position = CharField(max_length=100) 21 create_time = DateTimeField() 22 23 class Meta: 24 database = db
二、在d:\study\spyder目錄下的命令行中依次輸入,也就是job_model.py所在的目錄
```
python -i job_model.py
db.connect()
db.create_tables([Job])
```
若是此目錄下沒有job.db文件,則新建一個,而且新建一張名job的表,表結構如job_model.py所設計那樣。
若是此目錄已有job.db文件,則在原有的數據上新建一張名job的表。
此時,能夠用Navicat鏈接job.db數據庫查看,是否新增了表job
3、如何用python將數據寫入Sqlite?
這裏將用到python的第三方庫peewee,在命令行輸入pip3 install peewee進行安裝
from spyder.job_model import Job import peewee class spyder: ......
# 這裏傳入參數是一個字典 def storeDataToSqlite(self, dic): try: Job.create(job_id = dic['job_id'], salary_min = dic['salary_min'], salary_max = dic['salary_max'], company = dic['company'], company_id = 0, #先設爲0 company_info = dic['company_info'], url = dic['url'], attract = dic['attract'], detail = dic['detail'], address = dic['address'], publish_time = dic['publish_time'], keyword = dic['keyword'], city = dic['city'], job_exp = dic['exp'], position = dic['position'], create_time = self.today_date) except peewee.IntegrityError: print("數據插入錯誤:ID:%s,公司:%s已經存在" % (dic['job_id'],dic['company']))
4、新建一個spyder_job.py文件,開始設計編碼
方案一:用requests庫+BeautifulSoup
方案二:用selenium 3+Chrome
思路:搜索城市、招聘關鍵字 --》 頁碼--》爬一個一個招聘的URL --》重複一個一個招聘頁 --》重複爬取信息 --》重複寫入數據庫
5、用Navicat鏈接job.db數據庫查看錶job,是否新增了數據