Tools >> options >> connections >> 勾選 allow remote computers to connect
html
http://10.209.143:1234算法
IP:是第二步查看到的ip地址,替換成你本身的IP
port:8888是你在fiddler中配置的
注意:有些瀏覽器會顯示打不開,更換其餘瀏覽器就能夠了json
打開後點擊最後的連接(光標處),進行證書安裝就能夠了api
部分手機能夠直接點擊 安裝
部分手機須要 設置 >> wifi(或WLAN) >> 高級設置 >> 安裝證書 >>
選中剛剛下載的 證書文件 FiddlerRoot.cer >> 肯定
設置(Settings) >> 更多設置 >> 系統安全 >> 從存儲設備安裝
爲證書命名 , 輸入本身喜歡的名字,譬如 fiddler ,肯定 , 顯示 證書安裝完成
安裝完成後,在 設置(Settings) >> 更多設置 >> 系統安全 >> 信任的憑證 >>
系統和用戶2個tab頁 >> 用戶 >> 能夠查看到 DO_NOT_RUST_FiddlerRoot瀏覽器
PS: 不安裝證書,抓取http的數據是沒問題的,可是抓取不了https的數據安全
注意:
一、大部分app均可以直接抓包
二、少部分app沒辦法直接獲取,須要 wireshark、反編譯、脫殼 等方式去查找加密算法
三、app抓包通常都是抓取到服務器返回的json數據包服務器
手機打開豆果美食APP,同時打開fiddler,瀏覽你須要爬取的數據頁面,而後就能夠在fiddler中分析抓取的網絡請求網絡
由於手機數據通常都是json格式的數據,因此多注意網絡請求的格式便可session
很快就找到了咱們須要的請求,接下來就用scrapy模擬請求解析數據app
import scrapy class DouguoItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() auth = scrapy.Field() cook_name = scrapy.Field() cook_time = scrapy.Field() cook_difficulty = scrapy.Field() cook_story = scrapy.Field() img = scrapy.Field()
# Obey robots.txt rules ROBOTSTXT_OBEY = False
# -*- coding: utf-8 -*- import scrapy import json from ..items import DouguoItem class DouguoJiachangSpider(scrapy.Spider): name = 'douguo_jiachang' # allowed_domains = ['baidu.com'] # start_urls = ['http://api.douguo.net/recipe/v2/search/0/20'] page = 0 def start_requests(self): base_url = 'http://api.douguo.net/recipe/v2/search/{}/20' url = base_url.format(self.page) data = { 'client': '4', '_session': '1542354711458863254010224946', 'keyword': '家常菜', 'order': '0', '_vs': '400' } self.page += 20 yield scrapy.FormRequest(url=url, formdata=data, callback=self.parse) def parse(self, response): date = json.loads(response.body.decode()) # 將json格式數據轉換成字典 t = date.get('result').get('list') for i in t: douguo_item = DouguoItem() douguo_item['auth'] = i.get('r').get('an') douguo_item['cook_name'] = i.get('r').get('n') douguo_item['cook_time'] = i.get('r').get('cook_time') douguo_item['cook_difficulty'] = i.get('r').get('cook_difficulty') douguo_item['cook_story'] = i.get('r').get('cookstory') douguo_item['image_url'] = i.get('r').get('p') yield douguo_item
結果:
在前面的代碼基礎上繼續更加功能
# Configure item pipelines # See https://doc.scrapy.org/en/latest/topics/item-pipeline.html ITEM_PIPELINES = { 'douguo.pipelines.DouguoPipeline': 229, 'douguo.pipelines.ImagePipline': 300, } ............. ............. ............. IMAGES_STORE = './images/' DOWNLOAD_DELAY = 1
import scrapy class DouguoItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() auth = scrapy.Field() cook_name = scrapy.Field() cook_time = scrapy.Field() cook_difficulty = scrapy.Field() cook_story = scrapy.Field() image_url = scrapy.Field() image_path = scrapy.Field()
# -*- coding: utf-8 -*- # Define your item pipelines here # # Don't forget to add your pipeline to the ITEM_PIPELINES setting # See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html import os import scrapy from scrapy.pipelines.images import ImagesPipeline from scrapy.exceptions import DropItem from .settings import IMAGES_STORE class DouguoPipeline(object): def process_item(self, item, spider): print(item) return item class ImagePipline(ImagesPipeline): def get_media_requests(self, item, info): ''' 對圖片的地址生成Request請求進行下載 ''' yield scrapy.Request(url=item['image_url']) def item_completed(self, results, item, info): ''' 當圖片下載完成以後,調用方法 ''' format = '.' + item['image_url'].split('.')[-1] # 設置圖片格式 image_path = [x['path'] for ok, x in results if ok] # 獲取圖片的相對路徑 old_path = IMAGES_STORE + image_path[0] # 老的路徑 new_path = IMAGES_STORE + item['cook_name'] + format # 新的路徑 路徑+菜名+格式 item['image_path'] = new_path # 把新的路徑傳給item try: os.rename(old_path, new_path) # 改變下載的位置 except: raise DropItem('Image Download Failed') return item
結果:
ImagePipeline:
Scrapy用ImagesPipeline類提供一種方便的方式來下載和存儲圖片。須要PIL庫支持。