記錄一次OCR程序開發的嘗試

時間 2020-10-26

標籤 html python git github 算法 json api 數組 app ide 欄目 HTML 简体版

原文原文鏈接

最近工做中涉及到一部分文檔和紙質文檔的校驗工做，就想把紙質文件拍下來，用文字來互相校驗。想到以前調用有道智雲接口作了文檔翻譯。看了下OCR文字識別的API接口，有道提供了多種OCR識別的不一樣接口，有手寫體、印刷體、表格、整題識別、購物小票識別、身份證、名片等。乾脆此次就繼續用有道智雲接口作個小demo，把這些功能都試了試，當練手，也當爲之後的可能用到的功能作準備了。html

調用API接口的準備工做

首先，是須要在有道智雲的我的頁面上建立實例、建立應用、綁定應用和實例，獲取到應用的id和密鑰。具體我的註冊的過程和應用建立過程詳見文章分享一次批量文件翻譯的開發過程python

開發過程詳細介紹

下面介紹具體的代碼開發過程：git

此次的demo使用python3開發，包括maindow.py，ocrprocesser.py，ocrtools.py三個文件。界面部分，爲了簡化開發過程，使用python自帶的tkinter庫，提供選擇待識別文件和識別類型、展現識別結果的功能；ocrprocesser.py根據所選類型調用相應api接口，完成識別過程並返回結果；ocrtools.py封裝了經整理後的有道ocr 的各種api，實現了分類調用。github

界面部分：算法

界面部分代碼以下，使用了tkinter的grid來排列元素。json

root=tk.Tk()
root.title("netease youdao ocr test")
frm = tk.Frame(root)
frm.grid(padx='50', pady='50')

btn_get_file = tk.Button(frm, text='選擇待識別圖片', command=get_files)
btn_get_file.grid(row=0, column=0,  padx='10', pady='20')
text1 = tk.Text(frm, width='40', height='5')
text1.grid(row=0, column=1)

combox=ttk.Combobox(frm,textvariable=tk.StringVar(),width=38)
combox["value"]=img_type_dict
combox.current(0)
combox.bind("<<ComboboxSelected>>",get_img_type)
combox.grid(row=1,column=1)

label=tk.Label(frm,text="識別結果：")
label.grid(row=2,column=0)
text_result=tk.Text(frm,width='40',height='10')
text_result.grid(row=2,column=1)

btn_sure=tk.Button(frm,text="開始識別",command=ocr_files)
btn_sure.grid(row=3,column=1)
btn_clean=tk.Button(frm,text="清空",command=clean_text)
btn_clean.grid(row=3,column=2)

root.mainloop()

其中btn_sure的綁定事件ocr_files()將文件路徑和識別類型傳入ocrprocesser:api

def ocr_files():
    if ocr_model.img_paths:
        ocr_result=ocr_model.ocr_files()
        text_result.insert(tk.END,ocr_result)    else :
        tk.messagebox.showinfo("提示","無文件")

ocrprocesser中主要方法爲ocr_files()，將圖片base64處理後調用封裝的api。數組

def ocr_files(self):
    for img_path in self.img_paths:
        img_file_name=os.path.basename(img_path).split('.')[0]        #print('==========='+img_file_name+'===========')
        f=open(img_path,'rb')
        img_code=base64.b64encode(f.read()).decode('utf-8')
        f.close()
        print(img_code)
        ocr_result= self.ocr_by_netease(img_code, self.img_type)
        print(ocr_result)        return ocr_result

經本人通讀整理有道api的文檔，大體分爲如下四個api入口：手寫體/印刷體識別、身份證/名片識別、表格識別、整題識別，每一個接口的url不一樣，請求參數也不全一致，所以demo中首先根據識別類型加以區分：app

# 0-hand write# 1-print# 2-ID card# 3-name card# 4-table# 5-problemdef get_ocr_result(img_code,img_type):
    if img_type==0 or img_type==1:        return ocr_common(img_code)    elif img_type==2 or img_type==3 :        return ocr_card(img_code,img_type)    elif img_type==4:        return ocr_table(img_code)    elif img_type==5:        return ocr_problem(img_code)    else:        return "error:undefined type!"

然後根據接口所需的參數組織data等字段，並針對不一樣接口的返回值進行簡單解析和處理，並返回：ide

def ocr_common(img_code):
    YOUDAO_URL='https://openapi.youdao.com/ocrapi'
    data = {}
    data['detectType'] = '10012'
    data['imageType'] = '1'
    data['langType'] = 'auto'
    data['img'] =img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)
    response=do_request(YOUDAO_URL,data)['regions']
    result=[]    for r in response:        for line in r['lines']:
            result.append(line['text'])    return resultdef ocr_card(img_code,img_type):
    YOUDAO_URL='https://openapi.youdao.com/ocr_structure'
    data={}    if img_type==2:
        data['structureType'] = 'idcard'
    elif img_type==3:
        data['structureType'] = 'namecard'
    data['q'] = img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)    return do_request(YOUDAO_URL,data)def ocr_table(img_code):
    YOUDAO_URL='https://openapi.youdao.com/ocr_table'
    data = {}
    data['type'] = '1'
    data['q'] = img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)    return do_request(YOUDAO_URL,data)def ocr_problem(img_code):
    YOUDAO_URL='https://openapi.youdao.com/ocr_formula'
    data = {}
    data['detectType'] = '10011'
    data['imageType'] = '1'
    data['img'] = img_code
    data['docType'] = 'json'
    data=get_sign_and_salt(data,img_code)
    response=do_request(YOUDAO_URL,data)['regions']
    result = []    for r in response:        for line in r['lines']:            for l in line:
                result.append(l['text'])    return result

get_sign_and_salt()爲data加入了必要的簽名等信息：

def get_sign_and_salt(data,img_code):
    data['signType'] = 'v3'
    curtime = str(int(time.time()))
    data['curtime'] = curtime
    salt = str(uuid.uuid1())
    signStr = APP_KEY + truncate(img_code) + salt + curtime + APP_SECRET
    sign = encrypt(signStr)
    data['appKey'] = APP_KEY
    data['salt'] = salt
    data['sign'] = sign    return data

效果展現

手寫體結果展現：

印刷體（程序媛拿來代碼識別一番）：

名片識別，這裏我找來了一個名片模板，看起來準度仍是能夠的：

身份證(一樣是模板)：

表格識別（這超長的json， >_< emmm......）：

整題識別（公式識別也有作，識別結果json比較長，看起來沒那麼直觀，就不在這裏貼了）：

總結

總的而言，接口功能仍是很強大的，各類都支持。就是視覺算法工程師沒有作分類功能，須要本身分別對每一類的圖像進行分接口調用，並且接口徹底不可混用，好比在開發過程當中我將名片圖片看成身份證提交給api，結果返回了「Items not found!」，對於調用api的開發者來說有點麻煩，固然這樣也在必定程度上提升了識別準確率，並且我的猜想應該也是爲了方便分接口計費 : P。