距離上一次寫博客已經差很少有半年了,深表慚愧..... 廢話很少說,說說今天的任務,今天咱們的目的爬取外賣信息,選擇的平臺是餓了嗎。git
第一步:餓了嗎官網進去,定位中南海web
第二步:隨筆點進去一個商家json
咱們須要爬取的是每一種食品的名稱、月銷售量、評分、評論數api
第三步:查看源代碼發現根本找不到咱們須要的元素,很顯然這是一個動態頁面,那咱們能夠經過抓包來查看請求過程,F12+F5,app
很顯然在這裏找到了咱們須要的東西,找到了入口以後直接上代碼:url
1 # -*- coding: utf-8 -*- 2 # @Time : 2017/12/10 13:43 3 # @Author : Ricky 4 # @FileName: elm.py 5 # @Software: New_start 6 # @Blog :http://www.cnblogs.com/Beyond-Ricky/ 7 8 import requests 9 import json 10 restaurant_url = 'https://www.ele.me/restapi/shopping/v2/menu?restaurant_id=147207648' 11 web_data = requests.get(restaurant_url) 12 content = web_data.text 13 json_obj = json.loads(content) 14 for item in json_obj: 15 for food in item.get('foods'): 16 print(food.get('name')) 17 print(food.get('tips')) 18 print(food.get('rating'))
4.咱們的目的是爬取中南海附近全部的外賣信息,這樣一個個爬取確定是浪費時間的,返回到上一頁,咱們再隨便打開幾個店鋪,發現幾個url只有後面一串數字不一樣,觀察以後發現這就是店鋪的id,所以咱們只須要獲取全部店鋪的id就能夠獲取全部店鋪的外賣信息了。爬取id的過程其實和上一個頁面差很少,都是經過抓包完成的,這裏很少作解釋。直接上完整代碼spa
1 # -*- coding: utf-8 -*- 2 # @Time : 2017/12/10 15:35 3 # @Author : Ricky 4 # @FileName: final_version.py 5 # @Software: New_start 6 # @Blog :http://www.cnblogs.com/Beyond-Ricky/ 7 8 import requests 9 import json 10 import time 11 from bs4 import BeautifulSoup 12 import lxml 13 id_list = []#店鋪的id列表 14 name_list = []#店鋪的名稱列表 15 address_list = []#店鋪的地址列表 16 17 def get_all_id(): 18 for offset in range(0,985,24): 19 url='https://www.ele.me/restapi/shopping/restaurants?extras%5B%5D=activities&geohash=wx4g06hu38n&latitude=39.91406&limit=24&longitude=116.38477&offset={}&terminal=web'.format(offset) 20 web_data = requests.get(url) 21 soup=BeautifulSoup(web_data.text,'lxml') 22 content = soup.text 23 json_obj = json.loads(content) 24 for item in json_obj: 25 restaurant_address = item.get('address') 26 address_list.append(restaurant_address) 27 restaurant_name = item.get('name') 28 name_list.append(restaurant_name) 29 restaurant_id = item.get('id') 30 id_list.append(restaurant_id) 31 return name_list,address_list,id_list 32 get_all_id() 33 m=0#用來計數,第幾個店鋪 34 n=0#用來記錄數據,第幾條數據 35 for id in id_list: 36 m=m+1 37 restaurant_url = 'https://mainsite-restapi.ele.me/shopping/v2/menu?restaurant_id='+str(id) 38 print('*************************這裏是店鋪分界線******第{}個店鋪*********************************************'.format(m)) 39 40 print(name_list[m]) 41 print(address_list[m]) 42 headers = {'User-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'} 43 44 web_data = requests.get(restaurant_url,headers=headers) 45 #time.sleep(3) 46 content = web_data.text 47 json_obj = json.loads(content) 48 try: 49 for item in json_obj: 50 for food in item.get('foods'): 51 52 n +=1 53 print('第%d條數據:' % n) 54 print(food.get('name'),food.get('tips'),'評分',food.get('rating')) 55 except AttributeError as e : 56 pass 57 except IndexError as e1: 58 pass
至此咱們的任務就完成了!寫得很差的地方歡迎指正!後面還會有爬蟲系列的文章,謝謝你們!rest