需求:現有爬蟲程序(名爲CNSubAllInd),須要使其一直保持在後臺運行(若是執行完畢,當即從新啓動,繼續執行),並記錄其運行日誌。python
利用python的logging模塊來記錄日誌,利用subprocess模塊來和系統交互執行命令,檢測到子程序結束運行以後,從新開啓子程序。windows
代碼以下keeprunning.py(CNSubAllInd就是須要保持在後臺運行的程序):scrapy
# -*- coding: UTF-8 -*- #!DATE: 2018/10/9 #!@Author: yingying #keeprunning.py import os import subprocess # logging # require python2.6.6 and later import logging from logging.handlers import RotatingFileHandler ## log settings: SHOULD BE CONFIGURED BY config LOG_PATH_FILE = "D:\workspace\PyCharmProject\CompanyInfoSpider\my_service_mgr.log" LOG_MODE = 'a' LOG_MAX_SIZE = 10 * 1024 * 1024 # 10M per file LOG_MAX_FILES = 10 # 10 Files: my_service_mgr.log.1, printmy_service_mgrlog.2, ... LOG_LEVEL = logging.DEBUG LOG_FORMAT = "%(asctime)s %(levelname)-10s[%(filename)s:%(lineno)d(%(funcName)s)] %(message)s" handler = RotatingFileHandler(LOG_PATH_FILE, LOG_MODE, LOG_MAX_SIZE, LOG_MAX_FILES) formatter = logging.Formatter(LOG_FORMAT) handler.setFormatter(formatter) Logger = logging.getLogger() Logger.setLevel(LOG_LEVEL) Logger.addHandler(handler) # color output # pid = os.getpid() def print_error(s): print '\033[31m[%d: ERROR] %s\033[31;m' % (pid, s) def print_info(s): print '\033[32m[%d: INFO] %s\033[32;m' % (pid, s) def print_warning(s): print '\033[33m[%d: WARNING] %s\033[33;m' % (pid, s) def start_child_proc(command, merged): try: if command is None: raise OSError, "Invalid command" child = None if merged is True: # merge stdout and stderr child = subprocess.Popen(command) # child = subprocess.Popen(command, # stderr=subprocess.STDOUT, # 表示子進程的標準錯誤也輸出到標準輸出 # stdout=subprocess.PIPE # 表示須要建立一個新的管道 # ) else: # DO NOT merge stdout and stderr child = subprocess.Popen(command) # child = subprocess.Popen(command, # stderr=subprocess.PIPE, # stdout=subprocess.PIPE) return child except subprocess.CalledProcessError: pass # handle errors in the called executable except OSError: raise OSError, "Failed to run command!" def run_forever(command): print_info("start child process with command: " + ' '.join(command)) Logger.info("start child process with command: " + ' '.join(command)) merged = False child = start_child_proc(command, merged) failover = 0 while True: while child.poll() != None: failover = failover + 1 print_warning("child process shutdown with return code: " + str(child.returncode)) Logger.critical("child process shutdown with return code: " + str(child.returncode)) print_warning("restart child process again, times=%d" % failover) Logger.info("restart child process again, times=%d" % failover) child = start_child_proc(command, merged) # read child process stdout and log it out, err = child.communicate() returncode = child.returncode if returncode != 0: for errorline in err.slitlines(): Logger.info(errorline) else: Logger.info("execute child process failed") Logger.exception("!!!should never run to this!!!") if __name__ == "__main__": run_forever(['scrapy', 'crawl', 'CNSubAllInd'])
在這裏感謝cheungmine提供的subprocess腳本寫一個python的服務監控程序。ide
windows中運行方式:在命令行中輸入start pythonw keeprunning.py命令,以後便會打開pythonw窗口以下:ui
注意:這個窗口是關不掉的,由於有keeprunning在後臺運行,一旦檢測到爬蟲程序結束了,就會從新打開一個窗口(也即從新開啓程序)。想要關閉的話,只能在任務管理器中關閉pythonw.exe程序,便中止了監控,當前爬蟲程序執行完畢以後便結束爬蟲。this
可是原做者提供的經過read來獲取執行輸出結果的方法(以下),我使用的時候會出現deadlock,每次就卡在read這裏不往下執行了。spa
while True: while child.poll() != None: failover = failover + 1 print_warning("child process shutdown with return code: " + str(child.returncode)) Logger.critical("child process shutdown with return code: " + str(child.returncode)) print_warning("restart child process again, times=%d" % failover) Logger.info("restart child process again, times=%d" % failover) child = start_child_proc(command, merged) # deadlock!!! ch = child.stdout.read(1) if ch != '' and ch != '\n': line += ch if ch == '\n': print_info(line) line = ''
查了相關資料以及官方文檔以後,Python Popen().stdout.read() hang發現問題就出在這裏,按照官方文檔的解釋是之因此調用.stdout會卡死,是由於在讀完最後一行後管道空了。.net
爲了防止出現這樣的狀況應該使用communicate()來代替.stdout.read(),communicate的使用見官方文檔命令行