使用騰訊OCR進行文字識別

時間 2019-11-16

標籤使用騰訊 ocr 進行文字識別欄目騰訊简体版

原文原文鏈接

使用騰訊智能文字識別 OCR 對圖片進行文字識別java

前段時間有個項目須要一個圖片識別轉換成文字的功能，後來考慮了一下選擇了騰訊雲的文字識別OCR。當時對接的過程當中以爲有一些地方確實有些坑，因此再記錄一下，避免之後忘記。也但願能給須要的朋友提供一些幫助。git

OCR效果

能夠參考一下騰訊雲官網的連接：文字識別OCRgithub

配置騰訊雲OCR準備工做

註冊帳號

我是直接經過QQ帳號進行註冊登陸，你們也能夠查看騰訊雲官方教程進行註冊，註冊騰訊雲api

建立祕鑰

建立新祕鑰，可能會彈出窗口提示你不安全，建立子用戶之類，這個看你我的須要，想要建立子用戶就能夠建立，不想建立的話直接點解繼續使用便可。最後在左側菜單欄選擇雲API祕鑰->API祕鑰管理，點擊 新建祕鑰 便可，記錄下對應的APPID、SecretId、SecretKey，在項目中須要的地方替換掉。緩存

使用萬象優圖建立Bucket

在騰訊雲菜單中選擇萬象優圖（連接），點擊 Bucket管理，以後點擊頁面上的 綁定Bucket安全
- 會提示 該服務須要建立角色微信
- 點擊受權網絡
- 以後繼續選擇 贊成受權app
- 以後會提示進行身份驗證，使用微信掃描便可，也能夠選擇使用備選驗證方式等ide
再次點擊頁面上的 綁定Bucket
- 新增方式選擇新建
- 所屬項目不用改，直接用 默認項目
- 名稱本身命名便可，只要符合規則，其他沒什麼限制，記住這個名稱，以後在項目中會須要用到
- 其他選項能夠不須要改動
記住建立以後的bucket名稱，以後在項目中須要的地方替換掉

操做指引

若是上面的說明有比較模糊的地方，也能夠參考騰訊雲官網的操做指引。

實現代碼

生成簽名

具體說明能夠參考騰訊雲官網的說明：鑑權簽名，我這裏使用的java語言，因此直接使用的java簽名示例。將官網給出的代碼拷貝到java文件中便可，以後須要使用簽名的時候直接調用文件中的appSign方法

配置網絡請求，調用OCR識別接口

這一步是當時我以爲比較麻煩的，由於這個接口拼起來有點費勁。而且當前效果是識別本地文件

官方給出的文檔在這兒：OCR-通用印刷體識別，若是出現了一些錯誤也能夠在這裏找對應的狀態碼查看緣由。

配置網絡鏈接的方法

/** * 配置Connection對象 * @throws Exception */
private static HttpURLConnection handlerConnection(String path, String imageName) throws Exception {
    URL url = new URL(URL);
    // 獲取HttpURLConnection對象
    HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    connection.setRequestMethod("POST");	// 設置 Post 請求方式
    connection.setDoOutput(true);			// 容許輸出流
    connection.setDoInput(true);			// 容許輸入流
    connection.setUseCaches(false);			// 禁用緩存

    // 設置請求頭
    connection.setRequestProperty("Connection", "Keep-Alive");
    connection.setRequestProperty("Charset", "UTF-8");
    connection.setRequestProperty("Content-Type","multipart/form-data; boundary=" + BOUNDARY);
    connection.setRequestProperty("authorization", sign());
    connection.setRequestProperty("host", HOST);
    System.out.println( "請求頭設置完成");

    // 獲取HttpURLConnection的輸出流
    DataOutputStream outputStream = new DataOutputStream(connection.getOutputStream());
    
    StringBuffer strBufparam = new StringBuffer();
    strBufparam.append(LINE_END);
    // 封裝鍵值對數據參數
    String inputPartHeader1 = "--" + BOUNDARY + LINE_END + "Content-Disposition:form-data;name=\""+ "appid" +"\";" + LINE_END + LINE_END + APPID + LINE_END;
    String inputPartHeader2 = "--" + BOUNDARY + LINE_END + "Content-Disposition:form-data;name=\""+ "bucket" +"\";" + LINE_END + LINE_END + BUCKET + LINE_END;
    strBufparam.append(inputPartHeader1);
    strBufparam.append(inputPartHeader2);
    // 拼接完成後，一塊兒寫入
    outputStream.write(strBufparam.toString().getBytes());

    // 寫入圖片文件
    String imagePartHeader = "--" + BOUNDARY + LINE_END +
            "Content-Disposition: form-data; name=\"" + "image" + "\"; filename=\"" + imageName + "\"" + LINE_END +
            "Content-Type: image/jpeg" + LINE_END + LINE_END;
    byte[] bytes = imagePartHeader.getBytes();
    outputStream.write(bytes);
    // 獲取圖片的文件流
    String imagePath = path + File.separator + imageName;
    InputStream fileInputStream = getImgIns(imagePath);
    byte[] buffer = new byte[1024*2];
    int length = -1;
    while ((length = fileInputStream.read(buffer)) != -1){
        outputStream.write(buffer,0,length);
    }
    outputStream.flush();
    fileInputStream.close();

    // 寫入標記結束位
    byte[] endData = ("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" + LINE_END + BOUNDARY + "--" + LINE_END).getBytes();//寫結束標記位
    outputStream.write(endData);
    outputStream.flush();
    return connection;
}
複製代碼

部分用到的工具方法

/** * 根據文件名獲取文件輸入流 * @throws FileNotFoundException */
private static InputStream getImgIns(String imagePath) throws FileNotFoundException {
    File file = new File(imagePath);
    FileInputStream is = new FileInputStream(file);
    return is;
}
  
/** * 把輸入流的內容轉化成字符串 * @param is * @return * @throws IOException */
public static String readInputStream(InputStream is) throws IOException{
    ByteArrayOutputStream baos=new ByteArrayOutputStream();
    int length=0;
    byte[] buffer=new byte[1024];
    while((length=is.read(buffer))!=-1){
        baos.write(buffer, 0, length);
    }
    is.close();
    baos.close();
    return baos.toString();
}
    
/** * 簽名方法，調用Sign文件中的appSign方法生成簽名 * @return 生成後的簽名 */
public static String sign(){
    long expired = 10000;
    try {
    return Sign.appSign(APPID, SECRET_ID, SECRET_KEY, BUCKET, expired);
} catch (Exception e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}
    return null;
}
複製代碼

進行圖片識別

/*** * 上傳圖片進行識別 * @param urlStr 請求地址 * @param path 圖片所在文件夾的路徑 * @param imageName 圖片名稱 */
public void uploadImage(String path, String imageName) {
    new Thread(){
        @Override
        public void run() {
            try {
                // 配置HttpURLConnection對象
                HttpURLConnection connection = handlerConnection(path, imageName);
                // 鏈接HttpURLConnection
                connection.connect();
                // 獲得響應
                int responseCode = connection.getResponseCode();
                if(responseCode == HttpURLConnection.HTTP_OK){
                    String result = readInputStream(connection.getInputStream());//將流轉換爲字符串。
                    System.out.println("請求成功：" + result);
                } else {
                    String errorMsg = readInputStream(connection.getErrorStream());//將流轉換爲字符串。
                    System.out.println("請求失敗：" + errorMsg);
                }
            } catch (Exception e) {
                e.printStackTrace();
                System.out.println( "網絡請求出現異常: " + e.getMessage());
            }
        }
    }.start();
}
複製代碼