分析智聯招聘的API接口，進行數據爬取

時間 2019-11-19

標籤分析招聘 api 接口進行數據简体版

原文原文鏈接

一丶簡介

如今的網站基本上都是先後端分離的，前端的你看到的數據，基本上都不是HTML上的和數據，都是經過後端語言來讀取數據庫服務器的數據而後動態的加載數據到前端的網頁中。前端

而後天然而然的而後隨着ajax技術的出現，前端的語言也能夠實現對後端數據庫中的數據進行獲取，而後就出現了api接口這一說法。簡單的說就是經過特定的參數和地址來對某一網站的某個接口進行數據的獲取。python

通常api接口獲取到的數據都是json的，就算不是接送的數據，也是又規律，又秩序的數據。對於這些數據進行分析，那是很是簡單的。ajax

這也只是本人的一個小小的見解和簡單的理解。數據庫

二丶分析

進入到智聯招聘的官方網站中，按F12進入到開發者模式中。從數據的加載中能夠很輕易的找到三個api接口json

第一個API接口

https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity=合肥app

參數	做用
輸入你要的查詢的城市的名稱	會使返回的結果有按城市的編碼（code）

第二個API接口

https://dict.zhaopin.cn/dict/dictOpenService/getDict?dictNames=region_relation,education,recruitment,education_specialty,industry_relation,careet_status,job_type_parent,job_type_relationpython爬蟲

參數值	return—result（code）
region_relation	地區信息
education	學歷信息
recruitment	招聘信息（是否統招）
education_specialty	職業類別
industry_relation	行業
careet_status	到崗狀態
job_type_parent	職位類別
job_type_relation	職位

第三個API接口

https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId=664&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3

這個API接口的值都是在上面兩個接口中獲取到的代碼，

參數	做用
pageSize	獲取的數據的大小
cityId	城市
workExperience	工做經驗
education	學歷
companyType	公司性質
employmentType	職位類型
jobWelfareTag	工做福利
kw	關鍵字
kt	值可變，做用暫時不明，參數不能少

三丶數據爬取

如今API接口都已經找到了，就是數據的獲取和本地的存儲了。

爬取數據的目標

根據輸入城市來進行數據的查詢和存儲，本次數據只查找python的工做崗位

每一個職位信息中都有不少的字段信息，爲了方便我就只提取幾個字段，方法相同

所有代碼：

"""
本次的數據爬取只作簡單的反爬蟲預防策略
"""
import requests
import os
import json

class siper(object):
    def __init__(self):
        self.header={
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
            "Origin":"https://sou.zhaopin.com",
            "Host":"fe-api.zhaopin.com",
            "Accept-Encoding":"gzip, deflate, br"
        }
        print("職位查詢程序開始······")
        # 打開文件
        self.file = "result.json"
        path = os.getcwd()
        pathfile = os.path.join(path,self.file)
        self.fp = open(pathfile,"w",encoding="utf-8")
        self.fp.write("[\n")

    def get_response(self,url):
        return requests.get(url=url,headers = self.header)

    def get_citycode(self,city):
        url = "https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity={}".format(city)
        response = self.get_response(url)
        result = json.loads(response.text)
        return result['data']['code']

    def parse_data(self,url):
        response = self.get_response(url)
        result = json.loads(response.text)['data']['results']
        items = []
        for i in result:
            item = {}
            item['職位'] = i['jobName']
            item['工資'] = i['salary']
            item['招聘狀態'] = i['timeState']
            item['經驗要求'] = i['workingExp']['name']
            item['學歷要求'] = i['eduLevel']['name']
            items.append(item)
        return items

    def save_data(self,items):
        num = 0
        for i in items:
            num = num + 1
            self.fp.write(json.dumps(i,ensure_ascii=False))
            if num == len(items):
                self.fp.write("\n")
            else:
                self.fp.write(",\n")
            print("%s--%s"%(str(num),str(i)))

    def end(self):
        self.fp.write("]")
        self.fp.close()
        print("職位查詢程序結束······")
        print("數據已寫入到{}文件中······".format(self.file))

    def main(self):
        try:
            cityname = input("請輸入你要查詢的城市的名稱（市級城市）：")
            city = self.get_citycode(cityname)
            url = "https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId={}&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3".format(
                city)
            items = self.parse_data(url)
            self.save_data(items)
            self.end()
        except Exception as e:
            print("城市輸入錯誤！！！（強制退出程序）")
            print(e)
            exit(0)


if __name__ == '__main__':
    siper = siper()
    siper.main()

執行結果：

執行結果文件：