瀏覽器用Chrome 用Ctrl+Shift+Delete清除瀏覽器緩存的Cookie 打開network準備抓包,點擊Preserve log保留全部日誌


拉勾網驗證流程: 1、請求登陸頁面: 請求url爲:https://passport.lagou.com/login/login.html 請求頭並無什麼內容,帶上簡單的Host,User-Agent把本身假裝成瀏覽器便可 響應頭裏包含有效的cookie信息 Set-Cookie:JSESSIONID=ABAAABAAADGAACFC0077EDC55EEC248392A667B221CE7AB; Path=/; HttpOnly Set-Cookie:user_trace_token=20171104165207-d69fee97-d5d1-4a06-a406-e41989257b25; 頁面內容裏包含有用的: X-Anit-Forge-Code X-Anit-Forge-Token ps:能夠從login.html的head標籤裏發現拉鉤程序員的註釋:爲了防止重複提交請求與表單,正是這條註釋爲老孃提供了幹它的靈感,可見有時候愛加註釋並非什麼好事 2、提交用戶名密碼 請求url爲:https://passport.lagou.com/login/login.json 請求頭裏須要攜帶: JESSIONID 'X-Anit-Forge-Code': X_Anti_Forge_Code, #從login.html頁面內容中找 'X-Anit-Forge-Token': X_Anti_Forge_Token, #從login.html頁面內容中找 'X-Requested-With': 'XMLHttpRequest', 請求體內data: 用戶名密碼 ps:用戶名爲明文,密碼爲密文,能夠輸錯用戶名,輸對密碼,而後在form data內獲取正確的密文密碼 Cookies: JSESSIONID user_trace_token 3、請求受權(上一步登陸成功後,並無被受權),拿到重定向的url 請求url爲:https://passport.lagou.com/grantServiceTicket/grant.html 請求頭: host user-agent 注意:受權成功後會重定向,若是重定向成功就完成登陸了 四、請求重定向的url,拿到最終的登陸session


import requests,re session = requests.Session() #步驟1、首先登錄login.html,獲取cookie r1 = session.get('https://passport.lagou.com/login/login.html', headers={'Host': "passport.lagou.com",'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) X_Anti_Forge_Token = re.findall(r"window.X_Anti_Forge_Token = '(.*)';",r1.text)[0] X_Anti_Forge_Code = re.findall(r"window.X_Anti_Forge_Code = '(.*)';",r1.text)[0] #步驟2、用戶登錄,攜帶上一次的cookie,後臺對cookie中的 jsessionid 進行受權 r3 = session.post( url='https://passport.lagou.com/login/login.json', data={ 'isValidate': True, # 'username': '424662508@qq.com', # 'password': '4c4c83b3adf174b9c22af4a179dddb63', 'username':'18611453110', 'password':'bff642652c0c9e766b40e1a6f3305274', 'request_form_verifyCode': '', 'submit': '', }, headers={ 'X-Anit-Forge-Code': X_Anti_Forge_Code, 'X-Anit-Forge-Token': X_Anti_Forge_Token, 'X-Requested-With': 'XMLHttpRequest', "Referer": "https://passport.lagou.com/login/login.html", "Host": "passport.lagou.com", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", }, ) print(r3.text) # print(r3.headers) #步驟三:進行受權 r4 = session.get('https://passport.lagou.com/grantServiceTicket/grant.html', allow_redirects=False, headers={'Host': "passport.lagou.com",'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) # print(r4.headers) location=r4.headers['Location'] # print(location) #步驟四:請求重定向的地址,拿到最終的登陸session r5= session.get(location, allow_redirects=True, headers={ 'Host': "www.lagou.com", 'Referer':'https://passport.lagou.com/login/login.html?', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) # print(r5.headers) #===============以上是登陸環節 r6=session.get('https://www.lagou.com/resume/myresume.html') print('xxx' in r6.text) print(r6.text)


import requests,re session = requests.Session() #步驟1、首先登錄login.html,獲取cookie r1 = session.get('https://passport.lagou.com/login/login.html', headers={'Host': "passport.lagou.com",'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) X_Anti_Forge_Token = re.findall(r"window.X_Anti_Forge_Token = '(.*)';",r1.text)[0] X_Anti_Forge_Code = re.findall(r"window.X_Anti_Forge_Code = '(.*)';",r1.text)[0] #步驟2、用戶登錄,攜帶上一次的cookie,後臺對cookie中的 jsessionid 進行受權 r3 = session.post( url='https://passport.lagou.com/login/login.json', data={ 'isValidate': True, # 'username': '424662508@qq.com', # 'password': '4c4c83b3adf174b9c22af4a179dddb63', 'username':'18611453110', 'password':'bff642652c0c9e766b40e1a6f3305274', 'request_form_verifyCode': '', 'submit': '', }, headers={ 'X-Anit-Forge-Code': X_Anti_Forge_Code, 'X-Anit-Forge-Token': X_Anti_Forge_Token, 'X-Requested-With': 'XMLHttpRequest', "Referer": "https://passport.lagou.com/login/login.html", "Host": "passport.lagou.com", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", }, ) print(r3.text) # print(r3.headers) #步驟三:進行受權 r4 = session.get('https://passport.lagou.com/grantServiceTicket/grant.html', allow_redirects=False, headers={'Host': "passport.lagou.com",'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) # print(r4.headers) location=r4.headers['Location'] # print(location) #步驟四:請求重定向的地址,拿到最終的登陸session r5= session.get(location, allow_redirects=True, headers={ 'Host': "www.lagou.com", 'Referer':'https://passport.lagou.com/login/login.html?', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) # print(r5.headers) #===============以上是登陸環節 #爬取職位信息 #步驟一:分析 #搜索職位的url樣例:https://www.lagou.com/jobs/list_python%E5%BC%80%E5%8F%91?labelWords=&fromSearch=true&suginput= from urllib.parse import urlencode keyword='python開發' url_encode=urlencode({'k':keyword},encoding='utf-8') #k=python%E5%BC%80%E5%8F%91 url='https://www.lagou.com/jobs/list_%s?labelWords=&fromSearch=true&suginput=' %url_encode.split('=')[1] #根據用戶的keyword拼接出搜索職位的url print(url) #拿到職位信息的主頁面 r7=session.get(url, headers={ 'Host': "www.lagou.com", 'Referer': 'https://passport.lagou.com/login/login.html?', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36' }) #發現主頁面中並無咱們想要搜索的職位信息,那麼確定是經過後期js渲染出的結果,一查,果真如此 r7.text #搜索職位:請求職位的url後只獲取了一些靜態內容,關於職位的信息是向https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false&isSchoolJob=0發送請求拿到json #步驟二:驗證分析的結果 #爬取職位信息,發post請求,拿到json數據:'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false&isSchoolJob=0' r8=session.post('https://www.lagou.com/jobs/positionAjax.json', params={ 'needAddtionalResult':False, 'isSchoolJob':'0', }, headers={ 'Host': "www.lagou.com", 'Origin':'https://www.lagou.com', 'Referer': url, 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'X-Anit-Forge-Code':'0', 'X-Anit-Forge-Token': '', 'X-Requested-With': 'XMLHttpRequest', 'Accept':'application/json, text/javascript, */*; q=0.01' }, data={ 'first':True, 'pn':'1', 'kd':'python開發' } ) print(r8.json()) #pageNo:1 表明第一頁,pageSize:15表明本頁有15條職位記錄,咱們須要作的是獲取總共有多少頁就能夠了 #步驟三(最終實現):實現根據傳入參數,篩選職位信息 from urllib.parse import urlencode keyword='python開發' url_encode=urlencode({'k':keyword},encoding='utf-8') #k=python%E5%BC%80%E5%8F%91 url='https://www.lagou.com/jobs/list_%s?labelWords=&fromSearch=true&suginput=' %url_encode.split('=')[1] #根據用戶的keyword拼接出搜索職位的url def search_position( keyword, pn=1, city='北京', district=None, bizArea=None, isSchoolJob=None, xl=None, jd=None, hy=None, yx=None, needAddtionalResult=False, px='detault'): params = { 'city': city, # 工做地點,如北京 'district': district, # 行政區,如朝陽區 'bizArea': bizArea, # 商區,如望京 'isSchoolJob': isSchoolJob, # 工做性質,如應屆 'xl': xl, # 學歷要求,如大專 'jd': jd, # 融資階段,如天使輪,A輪 'hy': hy, # 行業領域,如移動互聯網 'yx': yx, # 工資範圍,如10-15k 'needAddtionalResult': needAddtionalResult, 'px': 'detault' }, r8 = session.post('https://www.lagou.com/jobs/positionAjax.json', params={ 'city': city, #工做地點,如北京 'district': district,#行政區,如朝陽區 'bizArea': bizArea, #商區,如望京 'isSchoolJob': isSchoolJob, #工做性質,如應屆 'xl': xl, #學歷要求,如大專 'jd': jd,#融資階段,如天使輪,A輪 'hy': hy, #行業領域,如移動互聯網 'yx': yx, #工資範圍,如10-15k 'needAddtionalResult': needAddtionalResult, 'px':'detault' }, headers={ 'Host': "www.lagou.com", 'Origin': 'https://www.lagou.com', 'Referer': url, 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'X-Anit-Forge-Code': '0', 'X-Anit-Forge-Token': '', 'X-Requested-With': 'XMLHttpRequest', 'Accept': 'application/json, text/javascript, */*; q=0.01' }, data={ 'first': True, 'pn': pn, 'kd': keyword, } ) print(r8.status_code) print(r8.json()) return r8.json() #求一份北京朝陽區10-15k的python開發工做 keyword='python開發' yx='10k-15k' city='北京' district='朝陽區' isSchoolJob='0' #應屆或實習 response=search_position(keyword=keyword,yx=yx,city=city,district=district,isSchoolJob=isSchoolJob) results=response['content']['positionResult']['result'] #打印公司的詳細信息 def get_company_info(results): for res in results: info = ''' 公司全稱 : %s 地址 : %s,%s 發佈時間 : %s 職位名 : %s 職位類型 : %s,%s 工做模式 : %s 薪資 : %s 福利 : %s 要求工做經驗 : %s 公司規模 : %s 詳細連接 : https://www.lagou.com/jobs/%s.html ''' % ( res['companyFullName'], res['city'], res['district'], res['createTime'], res['positionName'], res['firstType'], res['secondType'], res['jobNature'], res['salary'], res['positionAdvantage'], res['workYear'], res['companySize'], res['positionId'] ) print(info) # 經分析,公司的詳細連接都是:https://www.lagou.com/jobs/2653020.html ,其中那個編號就是職位id #print('公司全稱[%s],簡稱[%s]' %(res['companyFullName'],res['companyShortName'])) get_company_info(results)


import requests,re session = requests.Session() #步驟1、首先登錄login.html,獲取cookie r1 = session.get('https://passport.lagou.com/login/login.html', headers={'Host': "passport.lagou.com",'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) X_Anti_Forge_Token = re.findall(r"window.X_Anti_Forge_Token = '(.*)';",r1.text)[0] X_Anti_Forge_Code = re.findall(r"window.X_Anti_Forge_Code = '(.*)';",r1.text)[0] #步驟2、用戶登錄,攜帶上一次的cookie,後臺對cookie中的 jsessionid 進行受權 r3 = session.post( url='https://passport.lagou.com/login/login.json', data={ 'isValidate': True, # 'username': '424662508@qq.com', # 'password': '4c4c83b3adf174b9c22af4a179dddb63', 'username':'18611453110', 'password':'bff642652c0c9e766b40e1a6f3305274', 'request_form_verifyCode': '', 'submit': '', }, headers={ 'X-Anit-Forge-Code': X_Anti_Forge_Code, 'X-Anit-Forge-Token': X_Anti_Forge_Token, 'X-Requested-With': 'XMLHttpRequest', "Referer": "https://passport.lagou.com/login/login.html", "Host": "passport.lagou.com", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", }, ) print(r3.text) # print(r3.headers) #步驟三:進行受權 r4 = session.get('https://passport.lagou.com/grantServiceTicket/grant.html', allow_redirects=False, headers={'Host': "passport.lagou.com",'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) # print(r4.headers) location=r4.headers['Location'] # print(location) #步驟四:請求重定向的地址,拿到最終的登陸session r5= session.get(location, allow_redirects=True, headers={ 'Host': "www.lagou.com", 'Referer':'https://passport.lagou.com/login/login.html?', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}) # print(r5.headers) #===============以上是登陸環節 #自動提交簡歷(data內的positionId即3476321.html的數字) #先訪問主頁面,拿到X_Anti_Forge_Tokenm,X_Anti_Forge_Code,userid r9 = session.get('https://www.lagou.com/jobs/3476321.html', headers={ 'Host': "www.lagou.com", 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36' }) X_Anti_Forge_Token = re.findall(r"window.X_Anti_Forge_Token = '(.*)';",r9.text)[0] X_Anti_Forge_Code = re.findall(r"window.X_Anti_Forge_Code = '(.*)';",r9.text)[0] userid=re.findall(r'value="(\d+)" name="userid"',r9.text)[0] print(userid,type(userid)) with open('a.html','w',encoding='utf-8') as f : f.write(userid) #而後發送用戶id與職位id,post提交便可 r10=session.post('https://www.lagou.com/mycenterDelay/deliverResumeBeforce.json', headers={ 'Host': "www.lagou.com", 'Origin':'https://www.lagou.com', 'Referer':'https://www.lagou.com/jobs/3737624.html', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'X-Anit-Forge-Code': X_Anti_Forge_Code, 'X-Anit-Forge-Token': X_Anti_Forge_Token, 'X-Requested-With': 'XMLHttpRequest', }, data={ 'userId':userid, 'positionId':'3476321', #即'positionId' 'force':False, 'type':'', 'resubmitToken':'' } ) print(r10.status_code) print(r10.text) #能夠去投遞箱內查看投遞結果,地址爲:https://www.lagou.com/mycenter/delivery.html