解決Python圖片處理模塊pillow使用中出現的問題

最近爬一個電影票房的網站(url: http://58921.com/alltime),上面總票房裏面實際上是一張圖片,那麼我須要把圖片識別成文字,來獲取票房數據。
 
我頭腦裏第一想到的解決方案就是要用tesseract3,別用2,經驗來講3相比2,對中文的支持更好一點。
 
而後,我開始使用pip安裝一系列相關的庫:
 
$ pip install Pillow
$ pip install pytesser3
$ pip install pytesseract
 
第一步,首先執行:
 
$ pip install pillow
 
出現報錯:
 
Collecting pillow
  Could not fetch URL https://pypi.python.org/simple/pillow/: There was a problem confirming the ssl certificate: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661) - skipping
  Could not find a version that satisfies the requirement pillow (from versions: )
No matching distribution found for pillow
 
截圖以下:
 
 
個人第一反應是加個sudo,sudo pip install pillow來安裝,出現一樣報錯,截圖以下:
 
 
實際上是pip的版本低了,而後我嘗試更新pip版本,使用以下命令:
 
python -m pip install --upgrade pip
 
出現報錯:
 
Could not fetch URL https://pypi.python.org/simple/pip/: There was a problem confirming the ssl certificate: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661) - skipping
Requirement already up-to-date: pip in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
 
截圖以下:
 
 
仍是不行!
 
那麼,換一種方式更新pip,命令以下:
 
$ pip install -U pip
 
仍是出現報錯:
 
Could not fetch URL https://pypi.python.org/simple/pip/: There was a problem confirming the ssl certificate: [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661) - skipping
Requirement already up-to-date: pip in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
 
截圖以下:
 
 
再換一種更新pip,命令以下:
 
curl https://bootstrap.pypa.io/get-pip.py | python
 
注意一下後面,若是你是python3,那麼:
 
curl https://bootstrap.pypa.io/get-pip.py | python3
 
終於能夠了!
 
最終解決方案參考至:
 

 
而後安裝pillow,命令以下:
 
$ pip install pillow
 
另外,建議使用pillow,PIL好多年前就停更了,如今pillow fork過來,而後一直在維護。
 
如今能夠使用最新的pip批量安裝上述的庫了。
 

 
 
後來寫了一個test.py,發現使用pytesseract.image_to_string()函數時,報下面的崩潰:
 
Traceback (most recent call last):
  File "/Users/baorunchen/Documents/code/repo/python/advanced/image_recognition_test.py", line 29, in <module>
    main()
  File "/Users/baorunchen/Documents/code/repo/python/advanced/image_recognition_test.py", line 26, in main
    run_log(pytesseract.image_to_string(im))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 193, in image_to_string
    return run_and_get_output(image, 'txt', lang, config, nice)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 140, in run_and_get_output
    run_tesseract(**kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 111, in run_tesseract
    proc = subprocess.Popen(command, stderr=subprocess.PIPE)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 390, in __init__
    errread, errwrite)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1024, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

 

截圖以下:
 
 
緣由是:安裝Tesseract-OCR後,其不會被默認添加至環境變量path中,已致使報錯;
 
解決這個問題可參考網址:
 
解決方案:
先須要在mac環境上安裝tesseract這個庫:
 
$ brew install tesseract
 
又報錯了,以下:
 
touch: /usr/local/Homebrew/.git/FETCH_HEAD: Permission denied
touch: /usr/local/Homebrew/Library/Taps/caskroom/homebrew-cask/.git/FETCH_HEAD: Permission denied
touch: /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/.git/FETCH_HEAD: Permission denied
fatal: Unable to create '/usr/local/Homebrew/.git/index.lock': Permission denied
error: could not lock config file .git/config: Permission denied
==> Downloading https://homebrew.bintray.com/bottles/tesseract-3.05.01.high_sierra.bottle.tar.gz
Already downloaded: /Users/baorunchen/Library/Caches/Homebrew/tesseract-3.05.01.high_sierra.bottle.tar.gz
==> Pouring tesseract-3.05.01.high_sierra.bottle.tar.gz
Error: The `brew link` step did not complete successfully
The formula built, but is not symlinked into /usr/local
Could not symlink share/man/man1/ambiguous_words.1
/usr/local/share/man/man1 is not writable.
 
You can try again using:
  brew link tesseract
==> Summary
🍺  /usr/local/Cellar/tesseract/3.05.01: 79 files, 38.7MB
 
截圖以下:
 
 
之間我嘗試更新brew,而後再brew install tesseract,沒什麼用;
 
$ brew update
$ sudo brew update
$ brew upgrade
$ brew cleanup
$ brew install tesseract
 
那麼,按照報錯提示執行下列命令:
 
$ brew link tesseract
 
出現下面報錯:
 
Linking /usr/local/Cellar/tesseract/3.05.01...
Error: Could not symlink share/man/man5/unicharambigs.5
/usr/local/share/man/man5 is not writable.
 
截圖以下:
 
 
嘗試解決brew link失敗的問題,參考網址:
 
根據它的報錯提示,注意到了"/usr/local/share/man/man5 is not writable.」
這個文件不可寫,說明沒權限,那麼我把該文件加上當前用戶的權限,執行下列命令:
$ sudo chown ${USER} /usr/local/share/man/man5
 
而後繼續brew link tesseract,根據錯誤提示,執行相應語句,截圖以下:
 
 
進行下一步,參照網址:
 
須要在代碼裏添加:
 
pytesseract.pytesseract.tesseract_cmd = '<path-to-tesseract-bin>'
 
命令行輸入:
 
$ which tesseract
 
以前沒有brew link成功,執行上述命令的結果應該是:
 
tesseract not found
 
如今成功了,結果是:
 
/usr/local/bin/tesseract

 

那麼,在代碼裏添加:
 
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'
 
而後應該就沒有pytesseract.image_to_string()報錯的問題了。
 

附代碼:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
# @version: python 2.7.13
# @author: baorunchen(runchen0518@gmail.com)
# @date: 2018/5/4
import os
 
import time
from PIL import Image
import pytesseract
 
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'
 
pic_path = '/Users/baorunchen/Desktop/test.png'
 
 
def run_log(log):
    print time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()), '-', log
 
 
def main():
    if not os.path.exists(pic_path):
        run_log('pic not exists!')
        exit(-1)
 
    im = Image.open(pic_path)
    run_log(pytesseract.image_to_string(im))
 
if __name__ == '__main__':
    main()
相關文章
相關標籤/搜索