Python爬蟲學習(三) ——————爬取外賣信息

時間 2019-11-19

標籤 python 爬蟲學習外賣信息欄目 Python 简体版

原文原文鏈接

距離上一次寫博客已經差很少有半年了，深表慚愧..... 廢話很少說，說說今天的任務，今天咱們的目的爬取外賣信息，選擇的平臺是餓了嗎。git

第一步：餓了嗎官網進去，定位中南海web

第二步：隨筆點進去一個商家json

咱們須要爬取的是每一種食品的名稱、月銷售量、評分、評論數api

第三步：查看源代碼發現根本找不到咱們須要的元素，很顯然這是一個動態頁面，那咱們能夠經過抓包來查看請求過程，F12+F5,app

很顯然在這裏找到了咱們須要的東西，找到了入口以後直接上代碼：url

 1 # -*- coding: utf-8 -*-
 2 # @Time    : 2017/12/10 13:43
 3 # @Author  : Ricky
 4 # @FileName: elm.py
 5 # @Software: New_start
 6 # @Blog    ：http://www.cnblogs.com/Beyond-Ricky/
 7 
 8 import requests
 9 import json
10 restaurant_url = 'https://www.ele.me/restapi/shopping/v2/menu?restaurant_id=147207648'
11 web_data = requests.get(restaurant_url)
12 content = web_data.text
13 json_obj = json.loads(content)
14 for item in json_obj:
15     for food in item.get('foods'):
16         print(food.get('name'))
17         print(food.get('tips'))
18         print(food.get('rating'))

4.咱們的目的是爬取中南海附近全部的外賣信息，這樣一個個爬取確定是浪費時間的，返回到上一頁，咱們再隨便打開幾個店鋪，發現幾個url只有後面一串數字不一樣，觀察以後發現這就是店鋪的id，所以咱們只須要獲取全部店鋪的id就能夠獲取全部店鋪的外賣信息了。爬取id的過程其實和上一個頁面差很少，都是經過抓包完成的，這裏很少作解釋。直接上完整代碼spa

 1 # -*- coding: utf-8 -*-
 2 # @Time    : 2017/12/10 15:35
 3 # @Author  : Ricky
 4 # @FileName: final_version.py
 5 # @Software: New_start
 6 # @Blog    ：http://www.cnblogs.com/Beyond-Ricky/
 7 
 8 import requests
 9 import json
10 import time
11 from bs4 import BeautifulSoup
12 import lxml
13 id_list = []#店鋪的id列表
14 name_list = []#店鋪的名稱列表
15 address_list = []#店鋪的地址列表
16 
17 def get_all_id():
18     for offset in range(0,985,24):
19         url='https://www.ele.me/restapi/shopping/restaurants?extras%5B%5D=activities&geohash=wx4g06hu38n&latitude=39.91406&limit=24&longitude=116.38477&offset={}&terminal=web'.format(offset)
20         web_data = requests.get(url)
21         soup=BeautifulSoup(web_data.text,'lxml')
22         content = soup.text
23         json_obj = json.loads(content)
24         for item in json_obj:
25             restaurant_address = item.get('address')
26             address_list.append(restaurant_address)
27             restaurant_name = item.get('name')
28             name_list.append(restaurant_name)
29             restaurant_id = item.get('id')
30             id_list.append(restaurant_id)
31     return name_list,address_list,id_list
32 get_all_id()
33 m=0#用來計數，第幾個店鋪
34 n=0#用來記錄數據，第幾條數據
35 for id in id_list:
36     m=m+1
37     restaurant_url = 'https://mainsite-restapi.ele.me/shopping/v2/menu?restaurant_id='+str(id)
38     print('*************************這裏是店鋪分界線******第{}個店鋪*********************************************'.format(m))
39 
40     print(name_list[m])
41     print(address_list[m])
42     headers = {'User-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
43 
44     web_data = requests.get(restaurant_url,headers=headers)
45     #time.sleep(3)
46     content = web_data.text
47     json_obj = json.loads(content)
48     try:
49         for item in json_obj:
50             for food in item.get('foods'):
51 
52                 n +=1
53                 print('第%d條數據:' % n)
54                 print(food.get('name'),food.get('tips'),'評分',food.get('rating'))
55     except AttributeError as e :
56         pass
57     except IndexError as e1:
58         pass