在學習爬蟲的時候,也上網搜過很多相關教程,最終決定選擇在Linux上開發,只能用虛擬機了,可是虛擬機比較卡,也比較佔用系統資源,因此決定嘗試在Windows win7上安裝爬蟲Scrapy,能夠說安裝過程是這個坑跳到那個坑,累覺不愛啊。後來通過多方打探,終於找到一款安裝Scrapy的利器,真正的利器,下面放上地址:https://www.continuum.io/downloadscss
系統版本:Win7 64位python
選擇的版本爲2.7,由於2.7比較成熟,點擊下載,一路安裝,其中有一個界面是選擇是否要覆蓋本地已經安裝的Python版本,選擇是,最好是和安裝包一塊兒配套安裝,否則會出現不可知的錯誤。或者直接卸載本地已經安裝的Python版本,目錄手動刪除。我就是先卸載本地安裝的版本,刪除目錄,而後一路next,這樣更省心。默認會安裝最新版本的Python。git
安裝完成後,檢測Python版本,以管理員身份打開cmd:github
使用命令:pythonshell
說明已是最新的版本了,這下就放心了。api
使用命令:conda intall scrapybash
C:\Windows\System32> C:\Windows\System32>conda install scrapy Fetching package metadata ......... Solving package specifications: .......... Package plan for installation in environment C:\Program Files\Anaconda2: The following packages will be downloaded: package | build ---------------------------|----------------- twisted-16.6.0 | py27_0 4.4 MB service_identity-16.0.0 | py27_0 13 KB scrapy-1.1.1 | py27_0 378 KB ------------------------------------------------------------ Total: 4.8 MB The following NEW packages will be INSTALLED: attrs: 15.2.0-py27_0 conda-env: 2.6.0-0 constantly: 15.1.0-py27_0 cssselect: 1.0.0-py27_0 incremental: 16.10.1-py27_0 parsel: 1.0.3-py27_0 pyasn1-modules: 0.0.8-py27_0 pydispatcher: 2.0.5-py27_0 queuelib: 1.4.2-py27_0 scrapy: 1.1.1-py27_0 service_identity: 16.0.0-py27_0 twisted: 16.6.0-py27_0 w3lib: 1.16.0-py27_0 zope: 1.0-py27_0 zope.interface: 4.3.2-py27_0 The following packages will be UPDATED: conda: 4.2.9-py27_0 --> 4.2.13-py27_0 Proceed ([y]/n)? y Fetching packages ... An unexpected error has occurred. | ETA: 0:11:48 4.17 kB/s Please consider posting the following information to the conda GitHub issue tracker at: https://github.com/conda/conda/issues Current conda install: platform : win-64 conda version : 4.2.9 conda is private : False conda-env version : 4.2.9 conda-build version : 2.0.2 python version : 2.7.12.final.0 requests version : 2.11.1 root environment : C:\Program Files\Anaconda2 (writable) default environment : C:\Program Files\Anaconda2 envs directories : C:\Program Files\Anaconda2\envs package cache : C:\Program Files\Anaconda2\pkgs channel URLs : https://repo.continuum.io/pkgs/free/win-64/ https://repo.continuum.io/pkgs/free/noarch/ https://repo.continuum.io/pkgs/pro/win-64/ https://repo.continuum.io/pkgs/pro/noarch/ https://repo.continuum.io/pkgs/msys2/win-64/ https://repo.continuum.io/pkgs/msys2/noarch/ config file : None offline mode : False `$ C:\Program Files\Anaconda2\Scripts\conda-script.py install scrapy` Traceback (most recent call last): File "C:\Program Files\Anaconda2\lib\site-packages\conda\exceptions.py", l ine 473, in conda_exception_handler return_value = func(*args, **kwargs) File "C:\Program Files\Anaconda2\lib\site-packages\conda\cli\main.py", lin e 144, in _main exit_code = args.func(args, p) File "C:\Program Files\Anaconda2\lib\site-packages\conda\cli\main_install. py", line 80, in execute install(args, parser, 'install') File "C:\Program Files\Anaconda2\lib\site-packages\conda\cli\install.py", line 420, in install raise CondaRuntimeError('RuntimeError: %s' % e) CondaRuntimeError: Runtime error: RuntimeError: Runtime error: Could not ope n u'C:\\Program Files\\Anaconda2\\pkgs\\twisted-16.6.0-py27_0.tar.bz2.part' for writing (HTTPSConnectionPool(host='repo.continuum.io', port=443): Read timed out .).
發現是在安裝Twisted庫的時候超時了,因此呢,就單獨安裝這個庫吧scrapy
使用命令:conda install twistedide
C:\Windows\System32>conda install twisted Fetching package metadata ......... Solving package specifications: .......... Package plan for installation in environment C:\Program Files\Anaconda2: The following packages will be downloaded: package | build ---------------------------|----------------- twisted-16.6.0 | py27_0 4.4 MB The following NEW packages will be INSTALLED: conda-env: 2.6.0-0 constantly: 15.1.0-py27_0 incremental: 16.10.1-py27_0 twisted: 16.6.0-py27_0 zope: 1.0-py27_0 zope.interface: 4.3.2-py27_0 The following packages will be UPDATED: conda: 4.2.9-py27_0 --> 4.2.13-py27_0 Proceed ([y]/n)? y Fetching packages ... twisted-16.6.0 100% |###############################| Time: 0:01:09 66.89 kB/s Extracting packages ... [ COMPLETE ]|##################################################| 100% Unlinking packages ... [ COMPLETE ]|##################################################| 100% Linking packages ... [ COMPLETE ]|##################################################| 100%
顯示安裝成功,沒有任何錯誤,而後開始安裝爬蟲Scrapypost
使用命令:conda install scrapy
C:\Windows\System32>conda install scrapy Fetching package metadata ......... Solving package specifications: .......... Package plan for installation in environment C:\Program Files\Anaconda2: The following packages will be downloaded: package | build ---------------------------|----------------- service_identity-16.0.0 | py27_0 13 KB scrapy-1.1.1 | py27_0 378 KB ------------------------------------------------------------ Total: 391 KB The following NEW packages will be INSTALLED: attrs: 15.2.0-py27_0 cssselect: 1.0.0-py27_0 parsel: 1.0.3-py27_0 pyasn1-modules: 0.0.8-py27_0 pydispatcher: 2.0.5-py27_0 queuelib: 1.4.2-py27_0 scrapy: 1.1.1-py27_0 service_identity: 16.0.0-py27_0 w3lib: 1.16.0-py27_0 Proceed ([y]/n)? y Fetching packages ... service_identi 100% |###############################| Time: 0:00:00 68.39 kB/s scrapy-1.1.1-p 100% |###############################| Time: 0:00:05 65.50 kB/s Extracting packages ... [ COMPLETE ]|##################################################| 100% Linking packages ... [ COMPLETE ]|##################################################| 100%
剛纔已經安裝過Twisted庫了,此次不會超時了,顯示安裝成功,沒有任何報錯
測試是否已經安裝成功了,
測試命令:scrapy
scrapy startproject hello
C:\Windows\System32>scrapy Scrapy 1.1.1 - no active project Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark test commands fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directory Use "scrapy <command> -h" to see more info about a command C:\Windows\System32>d: D:\>dir 驅動器 D 中的卷沒有標籤。 卷的序列號是 0002-9E3C D:\ 的目錄 2016/12/03 12:20 399,546,128 Anaconda2-4.2.0-Windows-x86_64.exe 2016/12/03 09:43 <DIR> Program Files (x86) 2016/12/03 16:57 <DIR> python-project 2016/12/03 09:43 <DIR> 新建文件夾 2016/12/03 12:19 <DIR> 迅雷下載 1 個文件 399,546,128 字節 4 個目錄 38,932,201,472 可用字節 D:\>cd python-project D:\python-project>scrapy startproject hello New Scrapy project 'hello', using template directory 'C:\\Program Files\\Anacond a2\\lib\\site-packages\\scrapy\\templates\\project', created in: D:\python-project\hello You can start your first spider with: cd hello scrapy genspider example example.com D:\python-project>tree /f 文件夾 PATH 列表 卷序列號爲 0002-9E3C D:. └─hello │ scrapy.cfg │ └─hello │ items.py │ pipelines.py │ settings.py │ __init__.py │ └─spiders __init__.py D:\python-project>
能夠看出能夠使用scrapy命令建立爬蟲工程,剩下的就是快樂的啪啪啪吧。