pyspider + python2.7

  1. 升級pip
pip install --upgrade pip
  1. pip安裝pyspider
pip install pyspider
  1. 安裝phantomjs: https://phantomjs.org/downloa...
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
yum -y install bzip2
tar -jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C  /opt/
mv phantomjs-2.1.1-linux-x86_64/ phantomjs

創建軟連接html

ln -s /opt/phantomjs/bin/phantomjs /usr/bin/

安裝依賴python

yum -y install fontconfig

啓動驗證mysql

phantomjs
  1. pyspider啓動報錯
ValueError: Invalid configuration:
  - Deprecated option 'domaincontroller': use 'http_authenticator.domain_controller' instead.

在安裝包中找到pyspider的資源包,而後找到webui文件裏面的webdav.py文件打開,修改第209行便可。linux

'domaincontroller': NeedAuthController(app),
'http_authenticator':{
        'HTTPAuthenticator':NeedAuthController(app),
    },
  1. 使用mysql數據庫
    啓動pyspider
pyspider

默認會在啓動目錄生成data目錄,存放數據,默認使用SQLite數據庫web

[root@iZbp1gg50ddqbgxf1jpqwwZ opt]# cd data/
[root@iZbp1gg50ddqbgxf1jpqwwZ data]# ll
total 16
-rw-r--r-- 1 root root 3072 Jan 21 17:39 project.db
-rw-r--r-- 1 root root    0 Jan 21 17:39 result.db
-rw-r--r-- 1 root root    6 Jan 21 17:39 scheduler.1d
-rw-r--r-- 1 root root    6 Jan 21 17:39 scheduler.1h
-rw-r--r-- 1 root root    6 Jan 21 17:39 scheduler.all
-rw-r--r-- 1 root root    0 Jan 21 17:39 task.db
  1. 建立mysql數據庫
pyspider_taskdb
pyspider_projectdb
pyspider_resultdb
  1. 配置文件
touch /usr/lib/python2.7/site-packages/pyspider/config.json
{
    "taskdb": "mysql+taskdb://root:123456@121.40.112.188:3306/taskdb",
    "projectdb": "mysql+projectdb://root:123456@121.40.112.188:3306/projectdb",
    "resultdb":"mysql+resultdb://root:123456@121.40.112.188:3306/resultdb",
    "message_queue": "redis://root@123456127.0.0.1:6379/db",
    "webui": {
        "port":5000,
        "username": "evans",
        "password": "123456",
        "need-auth": true
    }
}
  1. 安裝組件
pip install mysql-connector
pip install redis 若是配置還用了redis的話
  1. 經過配置啓動
pyspider -c config.json all
  1. 啓動腳本
#!/bin/sh
cd `dirname $0`
if [ `ps -ef | grep 'pyspider' |grep -v 'grep' | wc -l` -lt "1" ];
then
    nohup pyspider -c config.json all  &
    echo "pyspider started"
fi
相關文章
相關標籤/搜索