人工智能實戰_團隊做業_第一次

時間 2019-11-22

標籤人工智能實戰團隊第一次简体版

原文原文鏈接

項目	內容
課程	人工智能實戰2019
做業要求	AI2019課程第四次做業（團隊）
團隊成員	李大，許駿鵬，陳澤寅，鄒鎮洪，宋知遇，藺立萱
本次做業做用	熟悉mini_batch的實現、做用
參考文獻	谷歌CNN教程，tensorflow CNN教程， kivy doc ，kivy畫板教程

組隊名稱：人工智能小分隊

NABCD

N(Need，需求)

手寫體識別是經典計算機視覺項目，旨在最基本的字符類型（數字）上測試機器視覺算法的實用性。本項目經過編寫一個交互式的手寫數字識別小軟件，能夠幫助同窗們練習動手實踐一個小型AI demo，鍛鍊代碼編寫能力、團隊編程能力和感覺基本的機器視覺算法。
但當前手寫數字識別早已達到很是高的水準，進而衍生出對於更加複雜的字符和字符組合的識別的需求，好比連續輸入、複雜字符識別、表達式識別等，其中數學算式識別被認爲是一個基礎但具備表明性的拓展，能夠幫助人們簡答而快捷的輸入數學算式，而沒必要經過繁瑣的Math語言編碼。其重點應用在於論文編寫中，是科研人員的重要助手。當前已經存在的競品有Mathpix（圖像轉Math代碼）、Word公式編輯器（手寫輸入字符）等，但鮮有單次輸入完整算式的程序。這樣的狀況一方面說明了這類軟件的需求，另外一方面說明了開發的必要性。html

A(Approach，作法)

搭建手寫數字識別軟件。python

模型實現

咱們使用tensorflow實現了一個CNN進行數字識別

卷積層相比於由全鏈接層在構成神經網絡時經過參數共享使網絡不須要圖片中識別目標的相對位置，減小了參數，提升訓練效率，是使用神經網絡作圖片識別時經常使用的網絡架構

CNN相關教程網上不少，咱們僅簡單介紹一下

CNN將二維圖像經過卷積和pooling逐漸展成一維向量，一維向量再送進全鏈接層作分類最終輸出每一個數字對應的識別機率

卷積核是m*m的矩陣

stride是其掃描時的步長

padding是掃描到邊緣時是否向外補全的設定

relu函數是一個激活函數，用於加速收斂，和sigmoid相似

AdamOptimizer是一個經常使用的步長調整梯度降低優化器，隨着迭代次數的增長步長減少，以達到更精細的權值調整

咱們使用Mnist數據集進行訓練
擴展軟件功能，支持連續輸入連續識別，支持算式識別。

B(Benefit，好處)

能夠幫助人們簡答而快捷的輸入數學算式，而沒必要經過繁瑣的Math語言編碼。其重點應用在於論文編寫中，是科研人員的重要助手。git

C(Competition，競爭)

目前主要有兩類競品：圖像轉公式、手寫公式算法

圖像轉公式：以Mathpix爲表明，這列工具的主要功能在於將現有文件上的公式以截圖的形式存儲，再從圖像中識別公式並轉爲Math代碼，適用於以代碼編輯公式的場景，便於用戶直接利用其它位置得到的公式，但對於自建公式的編輯無效用。

手寫公式：以微軟的Math Input Panel爲表明，這類工具的主要功能在於獲取用戶在輸入面板上繪製的公式圖形，再從圖形中識別公式並轉爲公式類型文本（不是代碼），適用於比較複雜的公式編輯、自建公式的編輯和沒法使用代碼編輯公式的情景，缺點是識別效率較低，容易誤識別。

咱們的主要競品是第二類。

D(Delivery, 交付)

初期：實現單個數字的識別。
中期：實現連續輸入字符的識別。
後期：擴展識別字符的類型（包含基本的算式字符），實現不太複雜的算式識別。編程

團隊成員&&分工

陳澤寅：模型的創建以及算法實現、文檔撰寫
李大：算法的實現以及參數的調節、文檔撰寫
鄒鎮洪：算法的實現以及參數的調節、需求分析
宋知遇：神經網絡的搭建數據的蒐集
藺立萱：神經網絡算法以及界面設計
許駿鵬：數據蒐集以及數據預處理

項目時間預估以及項目指望

但願在期末結束以前，咱們可以作出相似\(a/b+c/d\)這類的手寫算式的計算實現。

模型實現

咱們使用tensorflow實現了一個CNN進行數字識別
卷積層相比於由全鏈接層在構成神經網絡時經過參數共享使網絡不須要圖片中識別目標的相對位置，減小了參數，提升訓練效率，是使用神經網絡作圖片識別時經常使用的網絡架構
CNN相關教程網上不少，咱們僅簡單介紹一下
- CNN將二維圖像經過卷積和pooling逐漸展成一維向量，一維向量再送進全鏈接層作分類最終輸出每一個數字對應的識別機率
- 卷積核是m*m的矩陣
- stride是其掃描時的步長
- padding是掃描到邊緣時是否向外補全的設定
- relu函數是一個激活函數，用於加速收斂，和sigmoid相似
- AdamOptimizer是一個經常使用的步長調整梯度降低優化器，隨着迭代次數的增長步長減少，以達到更精細的權值調整
咱們使用Mnist數據集進行訓練

# coding: utf-8
import tensorflow as tf

def conv_net(input_x_dict, reuse, is_training):
    with tf.variable_scope('ConvNet', reuse=reuse):
        # TF Estimator input is a dict, in case of multiple inputs
        # 爲了適應有多個輸入變量的狀況，TF Estimator要求輸入是一個字典
        input_x = input_x_dict['images']

        # Input layer 輸入層 [28*28*1]
        input_x_images = tf.reshape(input_x, [-1, 28, 28, 1])
        # The reason why the first dimension should be -1, is that we don't know the size of input,

        conv1 = tf.layers.conv2d(
            inputs=input_x_images,
            filters=32,
            kernel_size=[5, 5],
            strides=1,
            padding='same',
            activation=tf.nn.relu
        )

        pool1 = tf.layers.max_pooling2d(
            inputs=conv1,
            pool_size=[2, 2],
            strides=2
        )

        conv2 = tf.layers.conv2d(
            inputs=pool1,
            filters=64,
            kernel_size=[5, 5],
            strides=1,
            padding='same',
            activation=tf.nn.relu
        )

        pool2 = tf.layers.max_pooling2d(
            inputs=conv2,
            pool_size=[2, 2],
            strides=2
        )

        flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

        dense = tf.layers.dense(
            inputs=flat,
            units=1024,
            activation=tf.nn.relu
        )

        # Dropout層
        # tf.layers.dropout
        # inputs 張量
        # rate 丟棄率
        # training 是不是在訓練的時候丟棄
        dropout = tf.layers.dropout(
            inputs=dense,
            rate=0.5,
            training=is_training
        )

        # Output Layer, activation is not needed (actually a dense layer)
        # 輸出層，不用激活函數（本質就是一個全鏈接層）
        logits = tf.layers.dense(
            inputs=dropout,
            units=10
        )
        # Output size 輸出形狀 [?,10]

        return logits


def model_fn(features, labels, mode):
    # 由於Dropout對於訓練和測試/預測有不一樣的行爲，咱們須要創建兩個獨立的網絡，但它們共享相同的權重
    logits_train = conv_net(features, reuse=False, is_training=True)  # Net for training  對於訓練
    logits_test = conv_net(features, reuse=True, is_training=False)  # Net for evaluation and prediction  對於評估和預測

    # Predictions  預測
    pred_classes = tf.argmax(logits_test, axis=1)
    pred_probas = tf.nn.softmax(logits_test)

    # If prediction mode, early return  若是是預測模式，則提早退出
    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {
            'class_ids': pred_classes,
            'probabilities': pred_probas
        }
        return tf.estimator.EstimatorSpec(mode, predictions=predictions)

    # Define loss  定義損失函數
    loss = tf.losses.softmax_cross_entropy(onehot_labels=labels, logits=logits_train)

    if mode == tf.estimator.ModeKeys.TRAIN:
        learning_rate = 0.001   # 學習速率
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

    # Calculate accuracy  計算準確率
    acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, axis=1), predictions=pred_classes)
    # Calculate recall  計算召回率
    rec_op = tf.metrics.recall(labels=tf.argmax(labels, axis=1), predictions=pred_classes)
    eval_metrics = {
        'accuracy': acc_op,
        'recall': rec_op
    }

    # For tensorboard display  用於tensorboard顯示
    tf.summary.scalar('accuracy', acc_op[1])
    tf.summary.scalar('recall', rec_op[1])

    # Evaluate the model  評估模型
    if mode == tf.estimator.ModeKeys.EVAL:
        # TF Estimators requires to return a EstimatorSpec, that specify
        # the different ops for training, evaluating, ...
        estim_specs = tf.estimator.EstimatorSpec(
            mode=mode,
            loss=loss,
            eval_metric_ops=eval_metrics
        )
        return estim_specs

trainer.py

# coding: utf-8
import tensorflow as tf
import CNN
import os
import shutil
from tensorflow.examples.tutorials.mnist import input_data

# Set the hyper params  設置超參數
num_step = 2000  # number of the training step  訓練迭代數
train_batch_size = 50 # batch size for training  訓練的batch大小
test_batch_size = 50 # batch size for test  測試全部的batch大小

# If a trained model exists, delete it and train a new model from the beginning
# 若是有已經訓練好的模型存在，刪除它，從頭開始訓練
if os.path.exists('saved_model'):
    shutil.rmtree('saved_model')

# Display the tensorflow log
# 顯示tensorflow日誌
tf.logging.set_verbosity(tf.logging.INFO)

# Get data from MNIST dataset
# 從MNIST數據集中獲取數據
mnist = input_data.read_data_sets('mnist_data/', one_hot=True)
train_x = mnist.train.images
train_y = mnist.train.labels
test_x = mnist.test.images
test_y = mnist.test.labels

# =============Training  訓練模型=============
# Build the Estimator  建立一個tensorflow estimator
model = tf.estimator.Estimator(CNN.model_fn, model_dir=r'saved_model/')

# Define the input function for training  # 定義訓練的數據輸入函數
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images':train_x}, y=train_y,
    batch_size=train_batch_size, num_epochs=None, shuffle=True
)

# Begin the training  開始訓練
model.train(train_input_fn, steps=num_step)

# =============Evaluate  測試評估模型=============
# Define the input function for evaluating  # 定義測試的數據輸入函數
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images':test_x}, y=test_y,
    batch_size=test_batch_size, shuffle=False
)

# Use the Estimator 'evaluate' method  開始測試
model.evaluate(test_input_fn)

predictor.py

class CNNPredictor:
    # Initialize the saved trained CNN model
    # 初始化保存的訓練好的CNN模型
    def __init__(self):
        self.model = tf.estimator.Estimator(CNN.model_fn, model_dir=r'saved_model/')
        print('獲取模型')

    # Process the image
    # 處理圖片
    def process_img(self, filepath):
        img = Image.open(filepath)  # Open the file  打開文件
        img = img.resize((28, 28))
        img = img.convert('L')  # Transfer the image into a grey image  轉換成灰度圖
        imgarr = np.array(img, dtype=np.float32)
        imgarr = imgarr.reshape([1, 28*28])/255.0
        return imgarr

    # Do predictions and return the result
    # 進行預測，返回預測結果
    def get_predictions(self, filepath):
        imgarr = self.process_img(filepath)
        predict_input_fn = tf.estimator.inputs.numpy_input_fn(
            x={'images':imgarr}, batch_size=1, shuffle=False
        )
        predictions = list(self.model.predict(predict_input_fn))
        return predictions[0]

界面實現

咱們使用kivy繪製頁面
核心畫圖部分是在convas上跟蹤鼠標座標隨着鼠標移動畫線
- 涉及on_touch_down（手指觸下），on_touch_move（觸摸點移動），on_touch_up（手指離開）三個事件

class PaintWidget(Widget):
    color = (254, 254, 254, 1)
    thick = 13

    def __init__(self, root, **kwargs):
        super().__init__(**kwargs)
        self.parent_widget = root
        
    def on_touch_down(self, touch):
        with self.canvas:
            Color(*self.color, mode='rgba')
            if touch.x > self.width or touch.y < self.parent_widget.height - self.height:
                return
            touch.ud['line'] = Line(points=(touch.x, touch.y), width=self.thick)

    def on_touch_move(self, touch):
        with self.canvas:
            if touch.x > self.width or touch.y < self.parent_widget.height - self.height:
                return
            touch.ud['line'].points += [touch.x, touch.y]

    def on_touch_up(self, touch):
        if touch.x > self.width or touch.y < self.parent_widget.height - self.height:
            return
        self.export_to_png('r.png')
        self.parent.parent.do_predictions()

其他的界面繪製邏輯只是簡單的kivy使用
輸入圖片被保存爲r.png後送入predictor進行識別，結果和機率顯示在界面上

class Recognizer(BoxLayout):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        self.predictor = CNNPredictor()  # Initialize the CNN model from the trained model  從保存的訓練好的模型中初始化CNN模型

        self.number = -1  # Variable to store the predicted number  保存識別的數字的變量

        self.orientation = 'horizontal'  # UI related  UI相關
        self.draw_window()

    # function to declare the components of the application, and add them to the window
    # 聲明程序UI組件的函數，而且將它們添加到窗口上
    def draw_window(self):
        # Clear button  清除按鈕
        # Painting board  畫板
        self.painter = PaintWidget(self, size_hint=(1, 8 / 9))
        # Label for hint text  提示文字標籤
        self.hint_label = Label(font_name=CNN_Handwritten_Digit_RecognizerApp.font_name, size_hint=(1, 1 / 45))
        # Label for predicted number  識別數字展現標籤
        self.result_label = Label(font_size=200, size_hint=(1, 1 / 3))
        # Label for some info  展現一些信息的標籤
        self.info_board = Label(font_size=24, size_hint=(1, 26 / 45))

        # BoxLayout  盒子佈局
        first_column = BoxLayout(orientation='vertical', size_hint=(2 / 3, 1))
        second_column = BoxLayout(orientation='vertical', size_hint=(1 / 3, 1))
        # Add widgets to the window  將各個組件加到應用窗口上
        first_column.add_widget(self.painter)
        first_column.add_widget(self.hint_label)
        second_column.add_widget(self.result_label)
        second_column.add_widget(self.info_board)
        second_column.add_widget(self.clear_button)
        self.add_widget(first_column)
        self.add_widget(second_column)

        # motion binding  動做綁定
        # Bind the click of the clear button to the clear_paint function
        # 將清除按鈕的點擊事件綁定到clear_paint函數上
        self.clear_button.bind(on_release=self.clear_paint)

        self.clear_paint()  # Initialize the state of the app  初始化應用狀態

    # Clear the painting board and initialize the state of the app.
    def clear_paint(self, obj=None):
        self.painter.canvas.clear()
        self.number = -1
        self.result_label.text = '^-^'
        self.hint_label.text = 'Please draw a digit on the board~'
        self.info_board.text = 'Info Board'

    # Extract info from the predictions, and display them on the window
    # 從預測結果中提取信息，並展現在窗口上
    def show_info(self, predictions):
        self.number = predictions['class_ids']
        self.result_label.text = str(self.number)
        self.hint_label.text = 'The predicted digit is displayed.'
        probabilities = predictions['probabilities']
        template = '''Probabilities
        0: %.4f%%
        1: %.4f%%
        2: %.4f%%
        3: %.4f%%
        4: %.4f%%
        5: %.4f%%
        6: %.4f%%
        7: %.4f%%
        8: %.4f%%
        9: %.4f%%'''
        self.info_board.text = template % tuple(probabilities * 100.0)

    # Use CNN predictor to do prediction, and call show_info to display the result
    # 使用CNN預測器作預測，並調用show_info函數將結果顯示出來
    def do_predictions(self):
        pre = self.predictor.get_predictions('r.png')
        self.show_info(pre)