做爲css
https://github.com/fanqingsong/web_full_stack_applicationhtml
子項目的一功能的核心部分,使用scrapy抓取數據,解析完的數據,使用 python requets庫,將數據推送到 webservice接口上, webservice接口負責保存數據到mongoDB數據庫。vue
實現步驟:python
一、 使用requests庫,與webservice接口對接。git
二、 使用scrapy抓取數據。github
三、 結合1 2 實現完整功能。web
庫的安裝和快速入門見:chrome
http://docs.python-requests.org/en/master/user/quickstart/#response-content數據庫
給出測試經過示例代碼:npm
insert_to_db.py
import requests
resp = requests.get('http://localhost:3000/api/v1/summary')
# ------------- GET --------------
if resp.status_code != 200:
# This means something went wrong.
raise ApiError('GET /tasks/ {}'.format(resp.status_code))for todo_item in resp.json():
print('{} {}'.format(todo_item['Technology'], todo_item['Count']))# ------------- POST --------------
Technology = {"Technology": "Django", "Count": "50" }resp = requests.post('http://localhost:3000/api/v1/summary', json=Technology)
if resp.status_code != 201:
raise ApiError('POST /Technologys/ {}'.format(resp.status_code))print("-------------------")
print(resp.text)
print('Created Technology. ID: {}'.format(resp.json()["_id"])
https://realpython.com/python-virtual-environments-a-primer/
Create a new virtual environment inside the directory:
# Python 2: $ virtualenv env # Python 3 $ python3 -m venv envNote: By default, this will not include any of your existing site packages.
windows 激活:
env\Scripts\activate
An open source and collaborative framework for extracting the data you need from websites.
In a fast, simple, yet extensible way.
https://scrapy-chs.readthedocs.io/zh_CN/0.24/topics/architecture.html
安裝和使用參考:
http://www.javashuo.com/article/p-ckbqmwtf-ep.html
安裝和運行過程報錯解決辦法:
一、 Scrapy運行ImportError: No module named win32api錯誤
https://blog.csdn.net/u013687632/article/details/57075514
pip install pypiwin32
http://www.javashuo.com/article/p-dkckhjpv-es.html
1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted 下載twisted對應版本的whl文件(個人Twisted‑17.5.0‑cp36‑cp36m‑win_amd64.whl),cp後面是python版本,amd64表明64位,
2. 運行命令:
pip install C:\Users\CR\Downloads\Twisted-17.5.0-cp36-cp36m-win_amd64.whl
給出示例代碼:
quotes_spider.py
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/tag/humor/',
]def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').extract_first(),
'author': quote.xpath('span/small/text()').extract_first(),
}next_page = response.css('li.next a::attr("href")').extract_first()
if next_page is not None:
yield response.follow(next_page, self.parse)
在此目錄下,運行
scrapy runspider quotes_spider.py -o quotes.json
輸出結果
[
{"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"},
{"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"},
{"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"},
{"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"},
{"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz"},
{"text": "\u201cRemember, we're madly in love, so it's all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"},
{"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"},
{"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"},
{"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"},
{"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"},
{"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"},
{"text": "\u201cA lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"}
]
https://github.com/fanqingsong/web_data_visualization
流程爲:
scrapy item pipeline 推送數據到webservice接口
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
import requestsclass ScratchZhipinPipeline(object):
def process_item(self, item, spider):print("--------------------")
print(item['text'])
print(item['author'])
print("--------------------")# save to db through web service
resp = requests.post('http://localhost:3001/api/v1/quote', json=item)
if resp.status_code != 201:
raise ApiError('POST /item/ {}'.format(resp.status_code))
print(resp.text)
print('Created Technology. ID: {}'.format(resp.json()["_id"]))return item
爬蟲運行: scrapy crawl quotes
webservice運行: npm run webservice_quotes
websocket運行: npm run websocket_quotes
vue調試環境運行: npm run dev
chrome:
db:
http://www.cnblogs.com/zhaoyingjie/p/6645811.html
快速生成requirement.txt的安裝文件 (CenterDesigner) xinghe@xinghe:~/PycharmProjects/CenterDesigner$ pip freeze > requirements.txt安裝所須要的文件 pip install -r requirement.txt