1.在第一次請求中直接攜帶用戶名和密碼。html
2.必需要訪問一次目標地址,服務器返回一些參數,例如驗證碼,一些特定的加密字符串等,本身經過相應手段分析與提取,第二次請求時帶上這些參數便可。能夠參考http://www.javashuo.com/article/p-vgcntrpr-mw.html瀏覽器
3.沒必要花裏胡哨,直接手動登陸成功,而後提取出cookie,加入到訪問頭中便可。服務器
1.先手動登陸本身的淘寶帳號,從中提取出cookie,以下圖中所示。cookie
2.cmd中workon本身的虛擬環境,建立項目 (scrapy startproject taobao)dom
3.pycharm打開項目目錄 ,在terminal中輸入(scrapy genspider itaobao taobao.com),獲得以下的目錄結構scrapy
4.setting中設置相應配置ide
5. 在itaobao中寫業務代碼。咱們先不加人cookie直接訪問購物車,代碼以下:網站
import scrapy class ItaobaoSpider(scrapy.Spider): name = 'itaobao' allowed_domains = ['taobao.com'] start_urls = [ 'https://cart.taobao.com/cart.htm?spm=a1z02.1.a2109.d1000367.OOeipq&nekot=1470211439694'] # 第一次就直接訪問購物車 def parse(self, response): print(response.text)
響應回來信息以下加密
明顯是跳轉到登陸頁面的意思。url
6.言歸正傳,正確的代碼以下,須要重寫
1 import scrapy 2 3 4 class ItaobaoSpider(scrapy.Spider): 5 name = 'itaobao' 6 allowed_domains = ['taobao.com'] 7 8 # start_urls = ['https://cart.taobao.com/cart.htm?spm=a1z02.1.a2109.d1000367.OOeipq&nekot=1470211439694'] 9 # 須要重寫start_requests方法 10 def start_requests(self): 11 url = "https://cart.taobao.com/cart.htm?spm=a1z02.1.a2109.d1000367.OOeipq&nekot=1470211439694" 12 # 此處的cookie爲手動登陸後從瀏覽器粘貼下來的值 13 cookie = "thw=cn; cookie2=16b0fe13709f2a71dc06ab1f15dcc97b; _tb_token_=fe3431e5fe755;" \ 14 " _samesite_flag_=true; ubn=p; ucn=center; t=538b39347231f03177d588275aba0e2f;" \ 15 " tk_trace=oTRxOWSBNwn9dPyorMJE%2FoPdY8zfvmw%2Fq5hoqmmiKd74AJ%2Bt%2FNCZ%" \ 16 "2FSIX9GYWSRq4bvicaWHhDMtcR6rWsf0P6XW5ZT%2FgUec9VF0Ei7JzUpsghuwA4cBMNO9EHkGK53r%" \ 17 "2Bb%2BiCEx98Frg5tzE52811c%2BnDmTNlzc2ZBkbOpdYbzZUDLaBYyN9rEdp9BVnFGP1qVAAtbsnj35zfBVfe09E%" \ 18 "2BvRfUU823q7j4IVyan1lagxILINo%2F%2FZK6omHvvHqA4cu2IaVAhy5MzzodyJhmXmOpBiz9Pg%3D%3D; " \ 19 "cna=5c3zFvLEEkkCAW8SYSQ2GkGo; sgcookie=E3EkJ6LRpL%2FFRZIBoXfnf; unb=578051633; " \ 20 "uc3=id2=Vvl%2F7ZJ%2BJYNu&nk2=r7kpR6Vbl9KdZe14&lg2=URm48syIIVrSKA%3D%3D&vt3=F8dBxGJsy36E3EwQ%2BuQ%3D;" \ 21 " csg=c99a3c3d; lgc=%5Cu5929%5Cu4ED9%5Cu8349%5Cu5929%5Cu4ED9%5Cu8349; cookie17=Vvl%2F7ZJ%2BJYNu;" \ 22 " dnk=%5Cu5929%5Cu4ED9%5Cu8349%5Cu5929%5Cu4ED9%5Cu8349; skt=4257a8fa00b349a7; existShop=MTU5MzQ0MDI0MQ%3D%3D;" \ 23 " uc4=nk4=0%40rVtT67i5o9%2Bt%2BQFc65xFQrUP0rGVA%2Fs%3D&id4=0%40VH93OXG6vzHVZgTpjCrALOFhU4I%3D;" \ 24 " tracknick=%5Cu5929%5Cu4ED9%5Cu8349%5Cu5929%5Cu4ED9%5Cu8349; _cc_=W5iHLLyFfA%3D%3D; " \ 25 "_l_g_=Ug%3D%3D; sg=%E8%8D%893d; _nk_=%5Cu5929%5Cu4ED9%5Cu8349%5Cu5929%5Cu4ED9%5Cu8349;" \ 26 " cookie1=VAmiexC8JqC30wy9Q29G2%2FMPHkz4fpVNRQwNz77cpe8%3D; tfstk=cddPBI0-Kbhyfq5IB_1FRmwX4zaRClfA" \ 27 "_qSREdGTI7eLP5PGXU5c-kQm2zd2HGhcE; mt=ci=8_1; v=0; uc1=cookie21=VFC%2FuZ9ainBZ&cookie15=VFC%2FuZ9ayeYq2g%3D%3D&cookie" \ 28 "16=WqG3DMC9UpAPBHGz5QBErFxlCA%3D%3D&existShop=false&pas=0&cookie14=UoTV75eLMpKbpQ%3D%3D&cart_m=0;" \ 29 " _m_h5_tk=cbe3780ec220a82fe10e066b8184d23f_1593451560729; _m_h5_tk_enc=c332ce89f09d49c68e13db9d906c8fa3; " \ 30 "l=eBxAcQbPQHureJEzBO5aourza7796IRb8sPzaNbMiInca6MC1hQ0PNQD5j-MRdtjgtChRe-PWBuvjdeBWN4dbNRMPhXJ_n0xnxvO.; " \ 31 "isg=BJ2drKVLn8Ww-Ht9N195VKUWrHmXutEMHpgqKF9iKfRAFrxIJAhD3DbMRAoQ1unE" 32 cookies = {} 33 # 提取鍵值對 請求頭中攜帶cookie必須是一個字典,因此要把原生的cookie字符串轉換成cookie字典 34 for cookie in cookie.split(';'): 35 key, value = cookie.split("=", 1) 36 cookies[key] = value 37 yield scrapy.Request(url=url, cookies=cookies, callback=self.parse) 38 39 def parse(self, response): 40 print(response.text)
響應信息以下(部分片斷):
很明顯這是本身購物車的真實源代碼。
好了,大功告成啦,接下來就能夠按照業務需求用xpath(本身喜歡用這種方式)提取本身想要的信息了。