前置:https://www.cnblogs.com/luocodes/p/11827850.htmlhtml
解決最後一個問題,如何將scrapy真正的打包成單文件python
耗了一夜時間,今天忽然有靈感了ajax
錯誤分析數組
不將scrapy.cfg文件與可執行文件放一塊兒,那麼就會報錯---爬蟲沒有找到cookie
報錯的緣由app
1.scrapy.cfg文件放入不進可執行文件中dom
2.scrapy目錄讀取不到scrapy.cfg文件scrapy
問題1ide
pyinstaller是將可執行文件解壓到系統的臨時文件中,在進行運行的函數
因此咱們只須要在可執行文件中找到它的目錄就能瞭解咱們打包的文件中到底包含了什麼
這裏還有一個問題,每當可執行文件運行完畢後,它產生的temp文件將會被刪除,因此咱們在start.py中須要設置一下程序的延遲
這樣一來程序不退出,臨時文件也隨之保留了下來,方便咱們查看
# -*- coding: utf-8 -*- from scrapy.cmdline import execute from scrapy.utils.python import garbage_collect from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings # import robotparser import os import sys import time import scrapy.spiderloader import scrapy.statscollectors import scrapy.logformatter import scrapy.dupefilters import scrapy.squeues import scrapy.extensions.spiderstate import scrapy.extensions.corestats import scrapy.extensions.telnet import scrapy.extensions.logstats import scrapy.extensions.memusage import scrapy.extensions.memdebug import scrapy.extensions.feedexport import scrapy.extensions.closespider import scrapy.extensions.debug import scrapy.extensions.httpcache import scrapy.extensions.statsmailer import scrapy.extensions.throttle import scrapy.core.scheduler import scrapy.core.engine import scrapy.core.scraper import scrapy.core.spidermw import scrapy.core.downloader import scrapy.downloadermiddlewares.stats import scrapy.downloadermiddlewares.httpcache import scrapy.downloadermiddlewares.cookies import scrapy.downloadermiddlewares.useragent import scrapy.downloadermiddlewares.httpproxy import scrapy.downloadermiddlewares.ajaxcrawl import scrapy.downloadermiddlewares.chunked import scrapy.downloadermiddlewares.decompression import scrapy.downloadermiddlewares.defaultheaders import scrapy.downloadermiddlewares.downloadtimeout import scrapy.downloadermiddlewares.httpauth import scrapy.downloadermiddlewares.httpcompression import scrapy.downloadermiddlewares.redirect import scrapy.downloadermiddlewares.retry import scrapy.downloadermiddlewares.robotstxt import scrapy.spidermiddlewares.depth import scrapy.spidermiddlewares.httperror import scrapy.spidermiddlewares.offsite import scrapy.spidermiddlewares.referer import scrapy.spidermiddlewares.urllength import scrapy.pipelines import scrapy.core.downloader.handlers.http import scrapy.core.downloader.contextfactory print(sys.path[0]) print(sys.argv[0]) print(os.path.dirname(os.path.realpath(sys.executable))) print(os.path.dirname(os.path.realpath(sys.argv[0]))) cfg=os.path.join(os.path.split(sys.path[0])[0],"scrapy.cfg") print(cfg) time.sleep(10) process = CrawlerProcess(get_project_settings()) # 'sk' is the name of one of the spiders of the project. process.crawl('biqubao_spider',domain='biqubao.com') process.start() # the script will block here until the crawling is finished
通過嘗試只有sys.path這個函數是獲取到temp文件的位置.
cfg這個變量就是我後來得出的scrapy.cfg在temp目錄下的位置
產生的temp文件以下:

temp文件中包含了cfg,咱們繼續測試一下,在temp文件中運行start.py文件,發現這裏是能夠正常運行的
那麼問題所在就是scrapy讀取不到cfg文件所產生的
問題2
如何讓scrapy讀取到cfg文件呢?
通過調試,我找到了一個scrapy讀取cfg文件路徑的函數
#scrapy\utils\conf.py
def get_sources(use_closest=True): xdg_config_home = os.environ.get('XDG_CONFIG_HOME') or \ os.path.expanduser('~/.config') sources = ['/etc/scrapy.cfg', r'c:\scrapy\scrapy.cfg', xdg_config_home + '/scrapy.cfg', os.path.expanduser('~/.scrapy.cfg')] if use_closest: sources.append(closest_scrapy_cfg()) return sources
函數的sources 是一個cfg文件路徑的數組
通過問題1的測試,那麼咱們這時候會想到了咱們執行了單文件,致使了scrapy讀取的是單文件路徑下的cfg,而不是temp文件中的cfg
那麼這時候只要在sources中添加單文件執行後產生的temp文件就能正確的讀取到cfg文件了
def get_sources(use_closest=True): xdg_config_home = os.environ.get('XDG_CONFIG_HOME') or \ os.path.expanduser('~/.config') sources = ['/etc/scrapy.cfg', r'c:\scrapy\scrapy.cfg', xdg_config_home + '/scrapy.cfg', os.path.expanduser('~/.scrapy.cfg'),os.path.join(os.path.split(sys.path[0])[0],"scrapy.cfg")] if use_closest: sources.append(closest_scrapy_cfg()) return sources
從新打包,發現移動單個的可執行文件也不會報錯了!