前言:
最近學習了一些OCR相關的基礎知識,包含目標檢測和天然語言處理。python
正好,在數字中國有相關的比賽:ios
https://www.datafountain.cn/competitions/334/details/rulec++
因此想動手實踐一下,實際中發現,對於數據標籤的處理和整個檢測和識別的流程並不熟悉,本身從頭去搞仍是有很大難度。git
幸虧,有大佬們以前開源的一些baseline能夠參考,有檢測的也有識別的,對於真真理解OCR識別是有幫助的。github
1)最初baseline AdvancedEAST + CRNN
https://github.com/Tianxiaomo/Cultural_Inheritance-Recognizing_Chinese_Calligraphy_in_Multiple_Scenariosshell
2)一個新的baseline:EAST + ocr_densenetjson
https://github.com/DataFountainCode/huawei_code_share
app
還有最原始的開源的EAST 源碼,advanced EAST源碼ide
https://github.com/argman/EAST學習
https://github.com/huoyijie/AdvancedEAST
CRNN 源碼
https://github.com/bgshih/crnn
以及densenet 等,都是很好的學習資源
https://github.com/yinchangchang/ocr_densenet
下面,先對EAST 的整個代碼進行梳理:
訓練樣本格式:
img_1.jpg
img_1.txt
img_2.jpg
img_2.txt
(這個能夠用第二個baseline中的convert_to_txt.py 實現)
即訓練集包含圖像以及圖像對應的標註信息(4個位置座標和文字)
訓練完成以後們就能夠進行測試python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \ --text_scale=512 --training_data_path=/data/ocr/icdar2015/ --geometry=RBOX --learning_rate=0.0001 --num_readers=24 \ --pretrained_model_path=/tmp/resnet_v1_50.ckpt
python eval.py --test_data_path=./tmp/test_image/ --gpu_list=0 --checkpoint_path=./tmp/east_icdar2015_resnet_v1_50_rbox/ --output_dir=./tmp/output/
加載已經訓練好的模型進行測試
bug解決:
一、lanms 沒法完成編譯,將Makefile中的Python3 替換爲 Python便可make:
I modify the file lanms/Makefile ,change the python3-config to python-config
CXXFLAGS = -I include -std=c++11 -O3 $(shell python3-config --cflags)
LDFLAGS = $(shell python3-config --ldflags)
二、在測試輸出時出現
Traceback (most recent call last): File "eval.py", line 194, in <module> tf.app.run() File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "eval.py", line 160, in main boxes, timer = detect(score_map=score, geo_map=geometry, timer=timer) File "eval.py", line 98, in detect boxes = lanms.merge_quadrangle_n9(boxes.astype('float32'), nms_thres) File "/work/ocr/EAST/lanms/__init__.py", line 12, in merge_quadrangle_n9 from .adaptor import merge_quadrangle_n9 as nms_impl ImportError: dynamic module does not define module export function (PyInit_adaptor)
nms_locality.nms_locality() is a python implemention, its much slower than c++ code, if just want to test, you can use it, these two methods should provide the same result.
When I change the lanms.merge_quadrangle_n9() in eval.py to nms_locality.nms_locality() There's no error.
C++版本實現調用有問題,直接用Python的實現,這裏只是慢一點,結果都是同樣的;
訓練方法:
1)轉換數據,對應圖像和標籤
For example: image_list.txt
90kDICT32px/1/2/373_coley_14845.jpg coley
90kDICT32px/17/5/176_Nevadans_51437.jpg nevadans
Note: make sure that images can be read from the path you specificed, such as:
path/to/90kDICT32px/1/2/373_coley_14845.jpg
path/to/90kDICT32px/17/5/176_Nevadans_51437.jpg
.......
命令行轉換爲tfrecord:
python tools/create_crnn_ctc_tfrecord.py \
--image_dir ./data/ --anno_file ./data/train.txt --data_dir ./tfrecords/ \
--validation_split_fraction 0.1
問題:
1)最初bug:TypeError: None has type NoneType, but expected one of: int, long
是由於有未定義的字,也就是不在字典中的字,因此在字典中,字典不完整,單獨加未在字典中的編碼 "<undefined>": 6736
並且在原代碼中:
def _string_to_int(label):
# convert string label to int list by char map
char_map_dict = json.load(open(FLAGS.char_map_json_file, 'r'))
int_list = []
for c in label:
int_list.append(char_map_dict.get(c,6736)) # 增長新的分類6736
2) python2 中會遇到許多編碼的問題,建議換成Python3
def _bytes_feature(value): if type(value) is str: value = value.encode('utf-8') if sys.version_info[0] > 2: value = value # convert string object to bytes if not isinstance(value, list): value = [value] return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))
代碼調試的時候,一步步打印中間結果,分析問題緣由:
try:
print (tf.train.Feature(int64_list=tf.train.Int64List(value=value)))
except: print(value)