Tensorflow快餐教程(1) - 30行代碼搞定手寫識別

摘要: Tensorflow入門教程1python

去年買了幾本講tensorflow的書,結果今年看的時候發現有些樣例代碼所用的API已通過時了。看來本身維護一個保持更新的Tensorflow的教程仍是有意義的。這是寫這一系列的初心。
快餐教程系列但願可以儘量下降門檻,少講,講透。
爲了讓你們在一開始就看到一個美好的場景,而不是停留在漫長的基礎知識積累上,參考網上的一些教程,咱們直接一開始就直接展現用tensorflow實現MNIST手寫識別的例子。而後基礎知識咱們再慢慢講。git

Tensorflow安裝速成教程

因爲Python是跨平臺的語言,因此在各系統上安裝tensorflow都是一件相對比較容易的事情。GPU加速的事情咱們後面再說。github

Linux平臺安裝tensorflow

咱們以Ubuntu 16.04版爲例,首先安裝python3和pip3。pip是python的包管理工具。windows

sudo apt install python3
sudo apt install python3-pip

而後就能夠經過pip3來安裝tensorflow:網絡

pip3 install tensorflow --upgrade

MacOS安裝tensorflow

建議使用Homebrew來安裝python。dom

brew install python3

安裝python3以後,仍是經過pip3來安裝tensorflow.機器學習

pip3 install tensorflow --upgrade

Windows平臺安裝Tensorflow

Windows平臺上建議經過Anaconda來安裝tensorflow,下載地址在:https://www.anaconda.com/download/#windowside

而後打開Anaconda Prompt,輸入:函數

conda create -n tensorflow pip
activate tensorflow
pip install --ignore-installed --upgrade tensorflow

這樣就安裝好了Tensorflow。工具

咱們迅速來個例子試下好很差用:

import tensorflow as tf
a = tf.constant(1)
b = tf.constant(2)

c = a * b

sess = tf.Session()

print(sess.run(c))

輸出結果爲2. 
Tensorflow顧名思義,是一些Tensor張量的流組成的運算。
運算須要一個Session來運行。若是print(c)的話,會獲得

Tensor("mul_1:0", shape=(), dtype=int32)

就是說這是一個乘法運算的Tensor,須要經過Session.run()來執行。

入門捷徑:線性迴歸

咱們首先看一個最簡單的機器學習模型,線性迴歸的例子。
線性迴歸的模型就是一個矩陣乘法:

tf.multiply(X, w)

而後咱們經過調用Tensorflow計算梯度降低的函數tf.train.GradientDescentOptimizer來實現優化。
咱們看下這個例子代碼,只有30多行,邏輯仍是很清晰的。例子來自github上大牛的工做:https://github.com/nlintz/TensorFlow-Tutorials,不是個人原創。

import tensorflow as tf
import numpy as np

trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.33 # 建立一些線性值附近的隨機值

X = tf.placeholder("float") 
Y = tf.placeholder("float")

def model(X, w):
    return tf.multiply(X, w) # X*w線性求值,很是簡單

w = tf.Variable(0.0, name="weights") 
y_model = model(X, w)

cost = tf.square(Y - y_model) # 用平方偏差作爲優化目標

train_op = tf.train.GradientDescentOptimizer(0.01).minimize(cost) # 梯度降低優化

# 開始建立Session幹活!
with tf.Session() as sess:
    # 首先須要初始化全局變量,這是Tensorflow的要求
    tf.global_variables_initializer().run()

    for i in range(100):
        for (x, y) in zip(trX, trY):
            sess.run(train_op, feed_dict={X: x, Y: y})

    print(sess.run(w))

最終會獲得一個接近2的值,好比我此次運行的值爲1.9183811

多種方式搞定手寫識別

線性迴歸不過癮,咱們直接一步到位,開始進行手寫識別。
MNIST

咱們採用深度學習三巨頭之一的Yann Lecun教授的MNIST數據爲例。如上圖所示,MNIST的數據是28x28的圖像,而且標記了它的值應該是什麼。

線性模型:logistic迴歸

咱們首先無論三七二十一,就用線性模型來作分類。
算上註釋和空行,一共加起來30行左右,咱們就能夠解決手寫識別這麼困難的問題啦!請看代碼:

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

def model(X, w):
    return tf.matmul(X, w) # 模型仍是矩陣乘法

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

X = tf.placeholder("float", [None, 784])
Y = tf.placeholder("float", [None, 10])

w = init_weights([784, 10]) 
py_x = model(X, w)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=py_x, labels=Y)) # 計算偏差
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost) # construct optimizer
predict_op = tf.argmax(py_x, 1) 

with tf.Session() as sess:
    tf.global_variables_initializer().run()

    for i in range(100):
        for start, end in zip(range(0, len(trX), 128), range(128, len(trX)+1, 128)):
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
        print(i, np.mean(np.argmax(teY, axis=1) ==
                         sess.run(predict_op, feed_dict={X: teX})))

通過100輪的訓練,咱們的準確率是92.36%。

無腦的淺層神經網絡

用了最簡單的線性模型,咱們換成經典的神經網絡來實現這個功能。神經網絡的圖以下圖所示。

layer1

咱們仍是無論三七二十一,創建一個隱藏層,用最傳統的sigmoid函數作激活函數。其核心邏輯仍是矩陣乘法,這裏面沒有任何技巧。

h = tf.nn.sigmoid(tf.matmul(X, w_h)) 
    return tf.matmul(h, w_o)

完整代碼以下,仍然是40多行,不長:

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

# 全部鏈接隨機生成權值
def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

def model(X, w_h, w_o):
    h = tf.nn.sigmoid(tf.matmul(X, w_h)) 
    return tf.matmul(h, w_o) 

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

X = tf.placeholder("float", [None, 784])
Y = tf.placeholder("float", [None, 10])

w_h = init_weights([784, 625])
w_o = init_weights([625, 10])

py_x = model(X, w_h, w_o)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=py_x, labels=Y)) # 計算偏差損失
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost) # construct an optimizer
predict_op = tf.argmax(py_x, 1)
 
with tf.Session() as sess:
    tf.global_variables_initializer().run()

    for i in range(100):
        for start, end in zip(range(0, len(trX), 128), range(128, len(trX)+1, 128)):
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})
        print(i, np.mean(np.argmax(teY, axis=1) ==
                         sess.run(predict_op, feed_dict={X: teX})))

第一輪運行,我此次的準確率只有69.11% ,第二次就提高到了82.29%。最終結果是95.41%,比Logistic迴歸的強!
請注意咱們模型的核心那兩行代碼,徹底就是無腦地全鏈接作了一個隱藏層而己,這其中沒有任何的技術。徹底是靠神經網絡的模型能力。

深度學習時代的方案 - ReLU和Dropout顯神通

上一個技術含量有點低,如今是深度學習的時代了,咱們有不少進步。好比咱們知道要將sigmoid函數換成ReLU函數。咱們還知道要作Dropout了。因而咱們仍是一個隱藏層,寫個更現代一點的模型吧:

X = tf.nn.dropout(X, p_keep_input)
    h = tf.nn.relu(tf.matmul(X, w_h))

    h = tf.nn.dropout(h, p_keep_hidden)
    h2 = tf.nn.relu(tf.matmul(h, w_h2))

    h2 = tf.nn.dropout(h2, p_keep_hidden)

    return tf.matmul(h2, w_o)

除了ReLU和dropout這兩個技巧,咱們仍然只有一個隱藏層,表達能力沒有太大的加強。並不能算是深度學習。

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

def model(X, w_h, w_h2, w_o, p_keep_input, p_keep_hidden): 
    X = tf.nn.dropout(X, p_keep_input)
    h = tf.nn.relu(tf.matmul(X, w_h))

    h = tf.nn.dropout(h, p_keep_hidden)
    h2 = tf.nn.relu(tf.matmul(h, w_h2))

    h2 = tf.nn.dropout(h2, p_keep_hidden)

    return tf.matmul(h2, w_o)

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

X = tf.placeholder("float", [None, 784])
Y = tf.placeholder("float", [None, 10])

w_h = init_weights([784, 625])
w_h2 = init_weights([625, 625])
w_o = init_weights([625, 10])

p_keep_input = tf.placeholder("float")
p_keep_hidden = tf.placeholder("float")
py_x = model(X, w_h, w_h2, w_o, p_keep_input, p_keep_hidden)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=py_x, labels=Y))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = tf.argmax(py_x, 1)

with tf.Session() as sess:
    # you need to initialize all variables
    tf.global_variables_initializer().run()

    for i in range(100):
        for start, end in zip(range(0, len(trX), 128), range(128, len(trX)+1, 128)):
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end],
                                          p_keep_input: 0.8, p_keep_hidden: 0.5})
        print(i, np.mean(np.argmax(teY, axis=1) ==
                         sess.run(predict_op, feed_dict={X: teX,
                                                         p_keep_input: 1.0,
                                                         p_keep_hidden: 1.0})))

從結果看到,第二次就達到了96%以上的正確率。後來就一直在98.4%左右遊蕩。僅僅是ReLU和Dropout,就把準確率從95%提高到了98%以上。

卷積神經網絡出場

真正的深度學習利器CNN,卷積神經網絡出場。此次的模型比起前面幾個無腦型的,的確是複雜一些。涉及到卷積層和池化層。這個是須要咱們後面詳細講一講了。

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

batch_size = 128
test_size = 256

def init_weights(shape):
    return tf.Variable(tf.random_normal(shape, stddev=0.01))

def model(X, w, w2, w3, w4, w_o, p_keep_conv, p_keep_hidden):
    l1a = tf.nn.relu(tf.nn.conv2d(X, w,                       # l1a shape=(?, 28, 28, 32)
                        strides=[1, 1, 1, 1], padding='SAME'))
    l1 = tf.nn.max_pool(l1a, ksize=[1, 2, 2, 1],              # l1 shape=(?, 14, 14, 32)
                        strides=[1, 2, 2, 1], padding='SAME')
    l1 = tf.nn.dropout(l1, p_keep_conv)

    l2a = tf.nn.relu(tf.nn.conv2d(l1, w2,                     # l2a shape=(?, 14, 14, 64)
                        strides=[1, 1, 1, 1], padding='SAME'))
    l2 = tf.nn.max_pool(l2a, ksize=[1, 2, 2, 1],              # l2 shape=(?, 7, 7, 64)
                        strides=[1, 2, 2, 1], padding='SAME')
    l2 = tf.nn.dropout(l2, p_keep_conv)

    l3a = tf.nn.relu(tf.nn.conv2d(l2, w3,                     # l3a shape=(?, 7, 7, 128)
                        strides=[1, 1, 1, 1], padding='SAME'))
    l3 = tf.nn.max_pool(l3a, ksize=[1, 2, 2, 1],              # l3 shape=(?, 4, 4, 128)
                        strides=[1, 2, 2, 1], padding='SAME')
    l3 = tf.reshape(l3, [-1, w4.get_shape().as_list()[0]])    # reshape to (?, 2048)
    l3 = tf.nn.dropout(l3, p_keep_conv)

    l4 = tf.nn.relu(tf.matmul(l3, w4))
    l4 = tf.nn.dropout(l4, p_keep_hidden)

    pyx = tf.matmul(l4, w_o)
    return pyx

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels
trX = trX.reshape(-1, 28, 28, 1)  # 28x28x1 input img
teX = teX.reshape(-1, 28, 28, 1)  # 28x28x1 input img

X = tf.placeholder("float", [None, 28, 28, 1])
Y = tf.placeholder("float", [None, 10])

w = init_weights([3, 3, 1, 32])       # 3x3x1 conv, 32 outputs
w2 = init_weights([3, 3, 32, 64])     # 3x3x32 conv, 64 outputs
w3 = init_weights([3, 3, 64, 128])    # 3x3x32 conv, 128 outputs
w4 = init_weights([128 * 4 * 4, 625]) # FC 128 * 4 * 4 inputs, 625 outputs
w_o = init_weights([625, 10])         # FC 625 inputs, 10 outputs (labels)

p_keep_conv = tf.placeholder("float")
p_keep_hidden = tf.placeholder("float")
py_x = model(X, w, w2, w3, w4, w_o, p_keep_conv, p_keep_hidden)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=py_x, labels=Y))
train_op = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost)
predict_op = tf.argmax(py_x, 1)

with tf.Session() as sess:
    # you need to initialize all variables
    tf.global_variables_initializer().run()

    for i in range(100):
        training_batch = zip(range(0, len(trX), batch_size),
                             range(batch_size, len(trX)+1, batch_size))
        for start, end in training_batch:
            sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end],
                                          p_keep_conv: 0.8, p_keep_hidden: 0.5})

        test_indices = np.arange(len(teX)) # Get A Test Batch
        np.random.shuffle(test_indices)
        test_indices = test_indices[0:test_size]

        print(i, np.mean(np.argmax(teY[test_indices], axis=1) ==
                         sess.run(predict_op, feed_dict={X: teX[test_indices],
                                                         p_keep_conv: 1.0,
                                                         p_keep_hidden: 1.0})))

咱們看下此次的運行數據:

0 0.95703125
1 0.9921875
2 0.9921875
3 0.98046875
4 0.97265625
5 0.98828125
6 0.99609375

在第6輪的時候,就跑出了99.6%的高分值,比ReLU和Dropout的一個隱藏層的神經網絡的98.4%大大提升。由於難度是越到後面越困難。
在第16輪的時候,居然跑出了100%的正確率:

7 0.99609375
8 0.99609375
9 0.98828125
10 0.98828125
11 0.9921875
12 0.98046875
13 0.99609375
14 0.9921875
15 0.99609375
16 1.0

綜上,藉助Tensorflow和機器學習工具,咱們只有幾十行代碼,就解決了手寫識別這樣級別的問題,並且準確度能夠達到如此程度。

原文連接

相關文章
相關標籤/搜索