當前Docker容器配置:html
- Centos6.8
- python2.6.6
目標Docker容器配置:python
- Centos6.8
- python2.7
- selenium 3.141.0
- geckodriver 0.15
- firefox 52.8.0
- Pillow 6.1.0
- pytesseract 0.2.7
安裝依賴環境
yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel libffi-devel gcc make wget git unzip gcc gcc-c++ libjpeg-devel libpng-devel libgif-devel
建立目錄存放安裝包
mkdir /usr/local/download cd /usr/local/download
安裝Python2.7
# 安裝python2.7 wget https://www.python.org/ftp/python/2.7.15/Python-2.7.15.tgz tar -zxvf Python-2.7.15.tgz cd Python-2.7.15 ./configure make && make install mv /usr/bin/python /usr/bin/python_bak ln -s /usr/local/bin/python2.7 /usr/bin/python # 安裝pip wget --no-check-certificate https://bootstrap.pypa.io/get-pip.py python get-pip.py ln -s /usr/local/bin/pip /usr/bin/pip # 配置pip源(豆瓣) cd mkdir .pip cd .pip vi pip.conf #寫入以下內容: [global] index-url=http://pypi.douban.com/simple trusted-host = pypi.douban.com
安裝tesseract
# 先安裝leptonica cd /usr/local/download wget http://www.leptonica.org/source/leptonica-1.72.tar.gz tar xvzf leptonica-1.72.tar.gz cd leptonica-1.72/ ./configure make && make install # 安裝tesseract cd /usr/local/download wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip unzip 3.04.zip cd tesseract-3.04/ ./configure make && make install # 手動更新動態連接庫 ldconfig # pip安裝pytesseract pip install pytesseract # 安裝語言包 在https://github.com/tesseract-ocr/tessdata 下載對應語言的模型文件 因爲目前只須要識別手機號碼和英文,只下載一個eng.traineddata文件便可, 將模型文件移動到/usr/local/share/tessdata 而後便可進行識別 # 示例 import pytesseract from PIL import Image image = Image.open('bb.png') code = pytesseract.image_to_string(image) print(code)
安裝selenium+firefox+geckodriver
安裝seleniumlinux
pip install selenium # 查看版本 pip show selenium
安裝geckodriverc++
cd /usr/local/download wget https://github.com/mozilla/geckodriver/releases/download/v0.15.0/geckodriver-v0.15.0-linux64.tar.gz tar xvzf geckodriver-*.tar.gz rm -f /usr/bin/geckodriver # 軟連接必須用絕對路徑 ln -s /usr/local/download/geckodriver /usr/bin/geckodriver
安裝firefoxgit
cd /usr/local/download wget http://www.rpmfind.net/linux/centos/6.10/os/x86_64/Packages/firefox-52.8.0-1.el6.centos.x86_64.rpm yum install -y firefox-52.8.0-1.el6.centos.x86_64.rpm
安裝中文字體github
# 新建字體目錄 chinese: mkdir /usr/share/fonts/chinese # 將windows系統盤 c:\windows\fonts\中的字體直接上傳至 centos 的 /usr/share/fonts/chinese目錄下便可 chmod -R 755 /usr/share/fonts/chinese yum -y install ttmkfdir ttmkfdir -e /usr/share/X11/fonts/encodings/encodings.dir # 修改fonts.conf的Font directory list,即字體列表,在這裏須要把咱們添加的中文字體位置加進去: vi /etc/fonts/fonts.conf <dir>/usr/share/fonts/chinese</dir> # 刷新內存中的字體緩存,這樣就不用reboot重啓了: fc-cache # 最後再次經過fc-list看一下字體列表: fc-list
安裝 xvfb
在Linux中有1個很好用的工具xvfb,它是1個X服務能夠用於在沒有顯示器的硬件和物理輸入設備上運行web
a,安裝必需的軟件包 [cat@localhost ~]# yum install -y xdg-utils xorg-x11-server-Xvfb xorg-x11-xkb-utils a,安裝xvfb的綁定 [cat@localhost ~]# pip install xvfbwrapper pyvirtualdisplay
測試用例:sql
#!/usr/bin/python # -*- coding:utf-8 -*- from selenium import webdriver from pyvirtualdisplay import Display from selenium.webdriver.firefox.firefox_binary import FirefoxBinary display = Display(visible=0, size=(800,600)) display.start() binary = FirefoxBinary('/usr/bin/firefox') driver = webdriver.Firefox(firefox_binary=binary) driver.get('https://www.baidu.com') print(driver.title.encode('utf8')) driver.quit() display.stop()
pip安裝所需包
#安裝包 pip install requests pip install Pillow pip install httplib2 pip install excel
參考:bootstrap
CentOS6.8 安裝python2.7,pip以及yumwindows
關注公衆號西加加先生
一塊兒玩轉Python。