OpenCV3的kNN算法進行OCR識別－使用Python

時間 2019-11-13

標籤 opencv3 opencv knn 算法進行 ocr 識別使用 python 欄目 Python 简体版

原文原文鏈接

OpenCV3的kNN算法進行OCR識別－使用Python

http://docs.opencv.org/master/d8/d4b/tutorial_py_knn_opencv.htmlhtml

Goal

In this chapterpython

We will use our knowledge on kNN to build a basic OCR application.git
We will try with Digits and Alphabets data available that comes with OpenCV.算法

目標app

• 要根據咱們掌握的 kNN 知識建立一個基本的 OCR 程序
• 使用 OpenCV 自帶的手寫數字和字母數據測試咱們的程序函數

OCR of Hand-written Digits

Our goal is to build an application which can read the handwritten digits. For this we need some train_data and test_data. OpenCV comes with an image digits.png (in the folder opencv/samples/data/) which has 5000 handwritten digits (500 for each digit). Each digit is a 20x20 image. So our first step is to split this image into 5000 different digits. For each digit, we flatten it into a single row with 400 pixels. That is our feature set, ie intensity values of all pixels. It is the simplest feature set we can create. We use first 250 samples of each digit as train_data, and next 250 samples as test_data. So let's prepare them first.
測試

1 手寫數字的 OCRui

咱們的目的是建立一個能夠對手寫數字進行識別的程序。爲了達到這個目的咱們須要訓練數據和測試數據。OpenCV 安裝包中有一副圖片(/samples/ python2/data/digits.png), 其中有 5000 個手寫數字(每一個數字重複 500遍)。每一個數字是一個 20x20 的小圖。因此第一步就是將這個圖像分割成 5000個不一樣的數字。咱們在將拆分後的每個數字的圖像重排成一行含有 400 個像素點的新圖像。這個就是咱們的特徵集,全部像素的灰度值。這是咱們能建立的最簡單的特徵集。咱們使用每一個數字的前 250 個樣本作訓練數據,剩餘的250 個作測試數據。先準備一下:this

import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('digits.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Now we split the image to 5000 cells, each 20x20 size
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]

# Make it into a Numpy array. It size will be (50,100,20,20)
x = np.array(cells)

# Now we prepare train_data and test_data.
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)

# Create labels for train and test data
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = train_labels.copy()

# Initiate kNN, train the data, then test it with test data for k=1
knn = cv2.KNearest()
knn.train(train,train_labels)
ret,result,neighbours,dist = knn.find_nearest(test,k=5)

# Now we check the accuracy of classification
# For that, compare the result with test_labels and check which are wrong
matches = result==test_labels
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print accuracy

So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option improve accuracy is to add more data for training, especially the wrong ones. So instead of finding this training data everytime I start application, I better save it, so that next time, I directly read this data from a file and start classification. You can do it with the help of some Numpy functions like np.savetxt, np.savez, np.load etc. Please check their docs for more details.spa

如今最基本的 OCR 程序已經準備好了,這個示例中咱們獲得的準確率爲91%。改善準確度的一個辦法是提供更多的訓練數據,尤爲是判斷錯誤的那些數字。爲了不每次運行程序都要準備和訓練分類器,咱們最好把它保留, 這樣在下次運行是時,只須要從文件中讀取這些數據開始進行分類就能夠了。Numpy 函數 np.savetxt,np.load 等能夠幫助咱們，具體的查看相應的文檔。

   1 # save the data
    2 np.savez('knn_data.npz',train=train, train_labels=train_labels)
    3 
    4 # Now load the data
    5 with np.load('knn_data.npz') as data:
    6     print data.files
    7     train = data['train']
    8     train_labels = data['train_labels']

In my system, it takes around 4.4 MB of memory. Since we are using intensity values (uint8 data) as features, it would be better to convert the data to np.uint8 first and then save it. It takes only 1.1 MB in this case. Then while loading, you can convert back into float32.

在個人系統中,佔用的空間大概爲 4.4M。因爲咱們如今使用灰度值 (unint8)做爲特徵,在保存以前最好先把這些數據裝換成 np.uint8 格式,這樣就只須要佔用 1.1M 的空間。在加載數據時再轉會到 float32。

OCR of English Alphabets

Next we will do the same for English alphabets, but there is a slight change in data and feature set. Here, instead of images, OpenCV comes with a data file, letter-recognition.data in opencv/samples/cpp/ folder. If you open it, you will see 20000 lines which may, on first sight, look like garbage. Actually, in each row, first column is an alphabet which is our label. Next 16 numbers following it are its different features. These features are obtained from UCI Machine Learning Repository. You can find the details of these features in this page.

There are 20000 samples available, so we take first 10000 data as training samples and remaining 10000 as test samples. We should change the alphabets to ascii characters because we can't work with alphabets directly.

英文字母的 OCR

接下來咱們來作英文字母的 OCR。和上面作法同樣,可是數據和特徵集有一些不一樣。如今 OpenCV 給出的不是圖片了,而是一個數據文件(/samples/ cpp/letter-recognition.data)。若是打開它的話,你會發現它有 20000 行, 第同樣看上去就像是垃圾。實際上每一行的第一列是咱們的一個字母標記。接下來的 16 個數字是它的不一樣特徵。這些特徵來源於UCI Machine Learning Repository。你能夠在此頁找到更多相關信息。

有 20000 個樣本能夠使用,咱們取前 10000 個做爲訓練樣本，剩下的10000 個做爲測試樣本。咱們應在先把字母表轉換成 asc 碼,由於咱們不能直接處理字母。

import cv2
import numpy as np
    3 import matplotlib.pyplot as plt
    4 
    5 # Load the data, converters convert the letter to a number
    6 data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
    7                     converters= {0: lambda ch: ord(ch)-ord('A')})
    8 
    9 # split the data to two, 10000 each for train and test
   10 train, test = np.vsplit(data,2)
   11 
   12 # split trainData and testData to features and responses
   13 responses, trainData = np.hsplit(train,[1])
   14 labels, testData = np.hsplit(test,[1])
   15 
   16 # Initiate the kNN, classify, measure accuracy.
   17 knn = cv2.KNearest()
   18 knn.train(trainData, responses)
   19 ret, result, neighbours, dist = knn.find_nearest(testData, k=5)
   20 
   21 correct = np.count_nonzero(result == labels)
   22 accuracy = correct*100.0/10000
   23 print accuracy

It gives me an accuracy of 93.22%. Again, if you want to increase accuracy, you can iteratively add error data in each level.

準確率達到了 93.22%。一樣你能夠經過增長訓練樣本的數量來提升準確率。