OCR6：Custom Traineddata

時間 2019-11-06

標籤 ocr6 ocr custom traineddata 简体版

原文原文鏈接

參考：https://groups.google.com/forum/#!msg/tesseract-ocr/MSYezIbckvs/kO1VoNKMDMQJ 字體

V4版本代碼示例：google

import pytesseract
from PIL import Image as img

text = pytesseract.image_to_string(img.open('src2\B1.jpg'), lang='teld+chi_sim', config='--psm 3 --oem 1')
print(text.replace('」', ''))

合併識別結果spa

在實際使用 tesseract-orc 識別庫的時候，初次製做的識別庫頗有可能識別率不太理想，須要後期慢慢補充。將多個修正過的box文件合併成一個識別庫。

首先，須要圖片樣本.tif文件，位置文件.box ,只要有這兩個文件在，就能夠合併字典

假設已存在以下樣品圖片和修正過的box文件：code

image.font.1.tif image.font.1.box
image.font.2.tif image.font.2.box
image.font.3.fit image.font.3.box

一、先生成相對應的 .tr 文件orm

tesseract image.font.1.tif image.font.1 nobatch box.train
tesseract image.font.2.tif image.font.2 nobatch box.train
tesseract image.font.3.tif image.font.3 nobatch box.train

二、提取字符blog

unicharset_extractor image.font.1.box image.font.2.box image.font.3.box

三、生成字體特徵文件圖片

echo image 0 0 0 0 0 >font_propertiesfont

四、執行以下命令get

mftraining -F font -U unicharset image.font.1.tr image.font.2.tr image.font.3.tr

五、彙集全部.tr 文件string

cntraining image.font.1.tr image.font.2.tr image.font.3.tr

六、重命名文件it

unicharset
inttemp
normproto
pfftable
shapetable

七、合併全部文件生成一個大的字庫文件

combine_tessdata image.

示例代碼：

/*生成box文件*/
/*tesseract teld.shz.exp0.tif teld.shz.exp0 -l chi_sim --psm 3 --oem 1 batch.nochop makebox*/ tesseract teld.shz.exp0.tif teld.shz.exp0 -l chi_sim batch.nochop makebox /*生成font_properties文件*/ echo shz 0 0 0 0 0 >font_properties /*生成.tr訓練文件*/ tesseract teld.shz.exp0.tif teld.shz.exp0 nobatch box.train /*生成字符集文件*/ unicharset_extractor teld.shz.exp0.box /*生成shape文件*/ shapeclustering -F font_properties -U unicharset teld.shz.exp0.tr /*生成聚字符特徵文件*/ mftraining -F font_properties -U unicharset teld.shz.exp0.tr /*生成字符正常化特徵文件*/ cntraining teld.shz.exp0.tr /*文件重命名*/ rename normproto teld.normproto rename inttemp teld.inttemp rename pffmtable teld.pffmtable rename shapetable teld.shapetable rename unicharset teld.unicharset /*合併訓練文件*/ combine_tessdata teld.

參考資料

https://yq.aliyun.com/articles/297912

1. Custom ViewGroups
2. Custom Control
3. Custom Diagrams
4. custom drawer
5. Apply custom metadata
6. Custom Date tag
7. Custom WAR Packager
8. DongGuan Custom Manufacturing
9. SharePoint2010 Custom Service
10. custom list view
更多相關文章...
• PHP restore_error_handler() 函數 - PHP參考手冊
• ASP GetLastError() 方法 (ASP 3.0) - ASP 教程

相關標籤/搜索

ocr6

custom

traineddata

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。