【docker】CentOS6.8+Python2.7+selenium+Firefox的搭建

當前Docker容器配置：html

Centos6.8
python2.6.6

目標Docker容器配置：python

Centos6.8
python2.7
selenium 3.141.0
geckodriver 0.15
firefox 52.8.0
Pillow 6.1.0
pytesseract 0.2.7

安裝依賴環境

yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel libffi-devel gcc make wget git unzip gcc gcc-c++ libjpeg-devel libpng-devel libgif-devel

建立目錄存放安裝包

mkdir /usr/local/download 
 cd /usr/local/download

安裝Python2.7

# 安裝python2.7
wget https://www.python.org/ftp/python/2.7.15/Python-2.7.15.tgz
tar -zxvf Python-2.7.15.tgz 
cd Python-2.7.15
./configure
make && make install
mv /usr/bin/python /usr/bin/python_bak
ln -s /usr/local/bin/python2.7 /usr/bin/python

# 安裝pip
wget --no-check-certificate https://bootstrap.pypa.io/get-pip.py
python get-pip.py
ln -s /usr/local/bin/pip /usr/bin/pip


# 配置pip源（豆瓣）
cd 
mkdir .pip
cd .pip
vi pip.conf
#寫入以下內容:
[global]
index-url=http://pypi.douban.com/simple
trusted-host = pypi.douban.com

安裝tesseract

# 先安裝leptonica
cd /usr/local/download
wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
tar xvzf leptonica-1.72.tar.gz
cd leptonica-1.72/
./configure
make && make install

# 安裝tesseract
cd /usr/local/download
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
unzip 3.04.zip 
cd tesseract-3.04/
./configure
make && make install
# 手動更新動態連接庫
ldconfig
# pip安裝pytesseract
pip install pytesseract
# 安裝語言包
在https://github.com/tesseract-ocr/tessdata 下載對應語言的模型文件
因爲目前只須要識別手機號碼和英文，只下載一個eng.traineddata文件便可，
將模型文件移動到/usr/local/share/tessdata
而後便可進行識別

# 示例
import pytesseract
from PIL import Image

image = Image.open('bb.png')
code = pytesseract.image_to_string(image)
print(code)

安裝selenium+firefox+geckodriver

安裝seleniumlinux

pip install selenium

# 查看版本
pip show selenium

安裝geckodriverc++

cd /usr/local/download
wget https://github.com/mozilla/geckodriver/releases/download/v0.15.0/geckodriver-v0.15.0-linux64.tar.gz
tar xvzf geckodriver-*.tar.gz
rm -f /usr/bin/geckodriver
# 軟連接必須用絕對路徑
ln -s /usr/local/download/geckodriver /usr/bin/geckodriver

安裝firefoxgit

cd /usr/local/download
wget http://www.rpmfind.net/linux/centos/6.10/os/x86_64/Packages/firefox-52.8.0-1.el6.centos.x86_64.rpm
yum install -y firefox-52.8.0-1.el6.centos.x86_64.rpm

安裝中文字體github

# 新建字體目錄 chinese：
mkdir /usr/share/fonts/chinese

# 將windows系統盤 c:\windows\fonts\中的字體直接上傳至 centos 的 /usr/share/fonts/chinese目錄下便可
chmod -R 755 /usr/share/fonts/chinese
yum -y install ttmkfdir
ttmkfdir -e /usr/share/X11/fonts/encodings/encodings.dir

# 修改fonts.conf的Font directory list，即字體列表，在這裏須要把咱們添加的中文字體位置加進去：
vi /etc/fonts/fonts.conf
<dir>/usr/share/fonts/chinese</dir>

# 刷新內存中的字體緩存，這樣就不用reboot重啓了：
fc-cache

# 最後再次經過fc-list看一下字體列表： 
fc-list

安裝 xvfb

在Linux中有1個很好用的工具xvfb,它是1個X服務能夠用於在沒有顯示器的硬件和物理輸入設備上運行web

a,安裝必需的軟件包
[cat@localhost ~]# yum install -y xdg-utils xorg-x11-server-Xvfb xorg-x11-xkb-utils

a,安裝xvfb的綁定
[cat@localhost ~]# pip install xvfbwrapper pyvirtualdisplay

測試用例：sql

#!/usr/bin/python
# -*- coding:utf-8 -*-
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
display = Display(visible=0, size=(800,600))
display.start()
binary = FirefoxBinary('/usr/bin/firefox')
driver = webdriver.Firefox(firefox_binary=binary)
driver.get('https://www.baidu.com')
print(driver.title.encode('utf8'))
driver.quit()
display.stop()

pip安裝所需包

#安裝包
pip install requests
pip install Pillow
pip install httplib2
pip install excel

參考：bootstrap

CentOS6.8 安裝python2.7，pip以及yumwindows

關注公衆號西加加先生一塊兒玩轉Python。