拉鉤網爬取全部python職位信息

  最近在找工做,因此爬取了拉鉤網的所有python職位,以便給本身提供一個方向。拉鉤網的數據仍是比較容易爬取的,獲得json數據直接解析就行,廢話很少說, 直接貼代碼:html

 

 1 import json
 2 import urllib
 3 import urllib2
 4 from openpyxl import load_workbook
 5 filename = 'E:\excel\position_number_11_2.xlsx'
 6 ws = load_workbook(filename=filename)
 7 sheet = ws.create_sheet(0)
 8 sheet.title = 'position'
 9 count = 1
10 
11 for page in xrange(100):
12     from_data = {
13         'first': 'false',
14         'pn': page,
15         'kd': 'Python'
16     }
17 
18     header = {
19         "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0',
20         'Referer': 'https://www.lagou.com/jobs/list_Python?px=default&city=%E5%85%A8%E5%9B%BD',
21     }
22     request_url = 'https://www.lagou.com/jobs/positionAjax.json?px=default&needAddtionalResult=false'
23     data = urllib.urlencode(from_data)
24 
25     request = urllib2.Request(request_url, headers=header, data=data)
26         try:
27         html = urllib2.urlopen(request).read().decode('utf-8')
28     except Exception:
29         print '沒有職位信息'
30         break
31     # print html
32     jsonobj = json.loads(html)
33     # print jsonobj
34     dict_obj = jsonobj['content']['positionResult']['result']
35     for item in dict_obj:
36         if item:
37             sheet.cell(row=count, column=1).value = item['companySize']
38             sheet.cell(row=count, column=2).value = item['workYear']
39             sheet.cell(row=count, column=3).value = item['education']
40             sheet.cell(row=count, column=4).value = item['financeStage']
41             sheet.cell(row=count, column=5).value = item['city']
42             sheet.cell(row=count, column=6).value = item['industryField']
43             sheet.cell(row=count, column=7).value = item['formatCreateTime']
44             sheet.cell(row=count, column=8).value = item['positionName']
45             sheet.cell(row=count, column=9).value = item['companyFullName']
46             sheet.cell(row=count, column=10).value = item['salary']
47             count += 1
48             ws.save('E:\excel\position_number_11_2.xlsx')

代碼寫得比較急,就沒怎麼規範。 過兩天把微博和豆瓣的代碼發出來,但願園裏的大神多指點^_^python

相關文章
相關標籤/搜索