TensorFlow入門學習(讓機器/算法幫助咱們做出選擇)

時間 2019-11-11

標籤 tensorflow 入門學習機器算法幫助咱們做出選擇简体版

原文原文鏈接

cataloguehtml

0. 我的理解
1. 基本使用
2. MNIST(multiclass classification)入門
3. 深刻MNIST
4. 卷積神經網絡：CIFAR-10 數據集分類
5. 單詞的向量表示(Vector Representations of Words)
6. 循環神經網絡(RNN)、LSTM(Long-Short Term Memory, LSTM)
7. 用深度學習網絡搭建一個聊天機器人

0. 我的理解node

在學習的最開始，我在這裏寫一個我的對deep leanring和神經網絡的粗略理解，不對的地方請多指教python

1. deep learning神經網絡本質是在lean什麼，我以爲是在learn一個一組參數，或者說是選擇模式，也就是咱們常說的分類器，這個分類器多是一個高維度分類器，由一組參數組成
2. 拿圖像驗證碼識別來講，這裏的參數就是指圖像區域中的權重分佈狀況(數字1和數字2的權重像素空間分佈是不一樣的)，若是咱們選定圖像的像素空間(例如32 * 32) + RGB色彩通道(3)做爲輸入特徵(本質上這就是特徵工程)，這些特徵會被tensorflow當成神經元，並在每一層對這些神經元進行組合，並計算出結果，而下一層神經網絡的神經元，會把這一層的輸出再進行組合，組合時，根據上一次預測的準確性，會自動經過back propogation給每一個組合不一樣的weight(比重)，這個過程會一直進行，直到調整出一個最佳擬合的weight，這個weight每每就是最貼近真實圖像的像素空間權重
3. . 世界上的全部事物，均可以抽象爲一個高維度矩陣，這個過程在不一樣的領域會有不一樣的提取抽象方式，即特徵工程，值得注意的是，擁有對應領域的專業知識，很是有助於特徵工程的實施
4. 咱們將特定領域的、須要分類/識別的對象抽象爲高維度矩陣後，進入deep learn算法模型中就成爲了神經元(節點)，deep learn模型接下來要作的事稱之爲"擬合"
5. 要完成分類和識別，deep learn的目標是找到一個擬合矩陣(具象來講就是"高維度-1分類切面")，要達到這個目的，須要3個元素
　　1) 擬合函數(activation 激活函數): 用於生成擬合切面
　　2) 偏差函數(Loss Function): 用於在網絡計算的過程當中計算當前擬合參數獲得的擬合切面離最優值的距離，以便隨時調整參數
　　3) 神經網絡結構: deep learn深度學習和普通神經網絡的區別就在於"層數"的不一樣，深度神經網絡每每有3層以上(輸入層、隱層、輸出層)，當層數增長後，在每一層選擇怎樣的組合的交叉結構就成了一個很難的事情，當前尚未完善的理論支撐能精確地計算出什麼樣的網絡結構能輸出最好的結果，一般的作法是根據不認同的業務場景去不斷嘗試不一樣的網絡結構，直到"試出"一個相對較好的網絡結構，而後再在這個網絡結構的基礎之上進行參數調整

值得注意的是，神經網絡最大的魔力在於，就在於即便咱們沒法準確地提取出各類各樣不少的特徵，而只要給與足夠多層的神經網絡和神經元，神經網絡本身會組合出有用的特徵。之因此能夠作到這點，我麼能夠來看一個實驗c++

http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8,8,8,8,8,8&seed=0.33671&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

對於輸入來講，咱們只給出2個維度的特徵x一、x2，而選擇1個6層的，每層有8個神經元的神經網絡，在每層網絡中，維度被擴展到了8，神經網絡會自動在訓練過程當中，尋找對它有價值的維度，並給與必定的weight權重，根據loss函數來不斷遞歸降低，直到找到一個最好的擬合權重參數。DL大大下降了特徵工程的難度git

0x1: 神經網絡到底理解了什麼github

其一，神經網絡理解了如何將輸入空間解耦爲分層次的卷積濾波器組
其二，神經網絡理解了從一系列濾波器的組合到一系列特定標籤的機率映射。神經網絡學習到的東西徹底達不到人類的「看見」的意義，從科學的的角度講，這固然也不意味着咱們已經解決了計算機視覺的問題

有些人說，卷積神經網絡學習到的對輸入空間的分層次解耦模擬了人類視覺皮層的行爲。這種說法可能對也可能不對，但目前未知咱們尚未比較強的證據來認可或否定它。固然，有些人能夠指望人類的視覺皮層就是以相似的方式學東西的，某種程度上講，這是對咱們視覺世界的天然解耦（就像傅里葉變換是對週期聲音信號的一種解耦同樣天然）【這裏是說，就像聲音信號的傅里葉變換表達了不一樣頻率的聲音信號這種很天然很物理的理解同樣，咱們可能會認爲咱們對視覺信息的識別就是分層來完成的，圓的是輪子，有四個輪子的是汽車，造型炫酷的汽車是跑車，像這樣】。可是，人類對視覺信號的濾波、分層次、處理的本質極可能和咱們弱雞的卷積網絡徹底不是一回事。視覺皮層不是卷積的，儘管它們也分層，但那些層具備皮質列的結構，而這些結構的真正目的目前還不得而知，這種結構在咱們的人工神經網絡中尚未出現（儘管喬大帝Geoff Hinton正在在這個方面努力）。此外，人類有比給靜態圖像分類的感知器多得多的視覺感知器，這些感知器是連續而主動的，不是靜態而被動的，這些感覺器還被如眼動等多種機制複雜控制web

Relevant Link:算法

https://groups.google.com/a/tensorflow.org/forum/#!forum/discuss
https://stackoverflow.com/questions/tagged/tensorflow
http://www.tensorfly.cn/tfdoc/resources/overview.html
https://www.zhihu.com/question/41667903
http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8,8,8,8,8,8&seed=0.33671&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

1. 基本使用shell

0x1: 綜述express

1. 使用圖 (graph) 來表示計算任務.
2. 在被稱之爲 會話 (Session) 的上下文 (context) 中執行圖.
3. 使用 tensor 表示數據.
4. 經過 變量 (Variable) 維護狀態.
5. 使用 feed 和 fetch 能夠爲任意的操做(arbitrary operation) 賦值或者從其中獲取數據.

TensorFlow 是一個編程系統, 使用圖來表示計算任務. 圖中的節點被稱之爲 op (operation 的縮寫). 一個 op 得到 0 個或多個 Tensor, 執行計算, 產生 0 個或多個 Tensor. 每一個 Tensor 是一個類型化的多維數組. 例如, 你能夠將一小組圖像集表示爲一個四維浮點數數組, 這四個維度分別是 [batch, height, width, channels].
一個 TensorFlow 圖描述了計算的過程. 爲了進行計算, 圖必須在會話裏被啓動. 會話將圖的 op 分發到諸如 CPU 或 GPU 之類的設備上, 同時提供執行 op 的方法. 這些方法執行後, 將產生的 tensor 返回. 在 Python 語言中, 返回的 tensor 是 numpy ndarray 對象; 在 C 和 C++ 語言中, 返回的 tensor 是 tensorflow::Tensor 實例.

0x2: 計算圖

TensorFlow 程序一般被組織成一個構建階段和一個執行階段. 在構建階段, op 的執行步驟被描述成一個圖. 在執行階段, 使用會話執行執行圖中的 op.
例如, 一般在構建階段建立一個圖來表示和訓練神經網絡, 而後在執行階段反覆執行圖中的訓練 op.

1. 構建圖(將待分類對象抽象爲高維矩陣)

構建圖的第一步, 是建立源 op (source op). 源 op 不須要任何輸入, 例如常量 (Constant). 源 op 的輸出被傳遞給其它 op 作運算.
Python 庫中, op 構造器的返回值表明被構造出的 op 的輸出, 這些返回值能夠傳遞給其它 op 構造器做爲輸入.

# -*- coding:utf-8 -*-

import tensorflow as tf


if __name__ == "__main__":
    # 建立一個常量 op, 產生一個 1x2 矩陣. 這個 op 被做爲一個節點
    # 加到默認圖中.
    #
    # 構造器的返回值表明該常量 op 的返回值.
    matrix1 = tf.constant([[3., 3.]])

    # 建立另一個常量 op, 產生一個 2x1 矩陣.
    matrix2 = tf.constant([[2.],[2.]])

    # 建立一個矩陣乘法 matmul op , 把 'matrix1' 和 'matrix2' 做爲輸入.
    # 返回值 'product' 表明矩陣乘法的結果.
    product = tf.matmul(matrix1, matrix2)

默認圖如今有三個節點, 兩個 constant() op, 和一個matmul() op. 爲了真正進行矩陣相乘運算, 並獲得矩陣乘法的結果, 必須在會話裏啓動這個圖.

2. 在一個會話中啓動圖

構造階段完成後, 才能啓動圖. 啓動圖的第一步是建立一個 Session 對象, 若是無任何建立參數, 會話構造器將啓動默認圖.

# -*- coding:utf-8 -*-

import tensorflow as tf

if __name__ == "__main__":
    # 建立一個常量 op, 產生一個 1x2 矩陣. 這個 op 被做爲一個節點
    # 加到默認圖中.
    #
    # 構造器的返回值表明該常量 op 的返回值.
    matrix1 = tf.constant([[3., 3.]])

    # 建立另一個常量 op, 產生一個 2x1 矩陣.
    matrix2 = tf.constant([[2.],[2.]])

    # 建立一個矩陣乘法 matmul op , 把 'matrix1' 和 'matrix2' 做爲輸入.
    # 返回值 'product' 表明矩陣乘法的結果.
    product = tf.matmul(matrix1, matrix2)

    # 默認圖如今有三個節點, 兩個 constant() op, 和一個matmul() op. 爲了真正進行矩陣相乘運算, 並獲得矩陣乘法的 結果, 你必須在會話裏啓動這個圖.

    # 啓動默認圖.
    sess = tf.Session()

    # 調用 sess 的 'run()' 方法來執行矩陣乘法 op, 傳入 'product' 做爲該方法的參數.
    # 上面提到, 'product' 表明了矩陣乘法 op 的輸出, 傳入它是向方法代表, 咱們但願取回
    # 矩陣乘法 op 的輸出.
    #
    # 整個執行過程是自動化的, 會話負責傳遞 op 所需的所有輸入. op 一般是併發執行的.
    #
    # 函數調用 'run(product)' 觸發了圖中三個 op (兩個常量 op 和一個矩陣乘法 op) 的執行.
    #
    # 返回值 'result' 是一個 numpy `ndarray` 對象.
    result = sess.run(product)
    print result
    # ==> [[ 12.]]

    # 任務完成, 關閉會話.
    sess.close()

在實現上, TensorFlow 將圖形定義轉換成分佈式執行的操做, 以充分利用可用的計算資源(如 CPU 或 GPU). 通常你不須要顯式指定使用 CPU 仍是 GPU, TensorFlow 能自動檢測. 若是檢測到 GPU, TensorFlow 會盡量地利用找到的第一個 GPU 來執行操做

0x3: Tensor

TensorFlow 程序使用 tensor 數據結構來表明全部的數據, 計算圖中, 操做間傳遞的數據都是 tensor. 你能夠把 TensorFlow tensor 看做是一個 n 維的數組或列表. 一個 tensor 包含一個靜態類型 rank, 和一個 shape.

0x4: 變量

變量維護圖執行過程當中的狀態信息. 下面的例子演示瞭如何使用變量實現一個簡單的計數器

# -*- coding:utf-8 -*-

import tensorflow as tf

if __name__ == "__main__":
    # 建立一個變量, 初始化爲標量 0.
    state = tf.Variable(0, name="counter")

    # 建立一個 op, 其做用是使 state 增長 1
    one = tf.constant(1)
    new_value = tf.add(state, one)
    update = tf.assign(state, new_value)

    # 啓動圖後, 變量必須先通過`初始化` (init) op 初始化,
    # 首先必須增長一個`初始化` op 到圖中.
    init_op = tf.initialize_all_variables()

    # 啓動圖, 運行 op
    with tf.Session() as sess:
      # 運行 'init' op
      sess.run(init_op)
      # 打印 'state' 的初始值
      print sess.run(state)
      # 運行 op, 更新 'state', 並打印 'state'
      for _ in range(3):
        sess.run(update)
        print sess.run(state)

代碼中 assign() 操做是圖所描繪的表達式的一部分, 正如 add() 操做同樣. 因此在調用 run() 執行表達式以前, 它並不會真正執行賦值操做.
一般會將一個統計模型中的參數表示爲一組變量. 例如, 你能夠將一個神經網絡的權重做爲某個變量存儲在一個 tensor 中. 在訓練過程當中, 經過重複運行訓練圖, 更新這個 tensor.

0x5: Fetch

爲了取回操做的輸出內容, 能夠在使用 Session 對象的 run() 調用執行圖時, 傳入一些 tensor, 這些 tensor
會幫助你取回結果. 在以前的例子裏, 咱們只取回了單個節點 state, 可是你也能夠取回多個 tensor:

# -*- coding:utf-8 -*-

import tensorflow as tf

if __name__ == "__main__":
    # 啓動默認圖.
    sess = tf.Session()

    input1 = tf.constant(3.0)
    input2 = tf.constant(2.0)
    input3 = tf.constant(5.0)
    intermed = tf.add(input2, input3)
    mul = tf.multiply(input1, intermed)

    with tf.Session():
      result = sess.run([mul, intermed])
      print result

0x6: Feed

上述示例在計算圖中引入了 tensor, 以常量或變量的形式存儲. TensorFlow 還提供了 feed 機制, 該機制能夠臨時替代圖中的任意操做中的 tensor 能夠對圖中任何操做提交補丁, 直接插入一個 tensor.
feed 使用一個 tensor 值臨時替換一個操做的輸出結果. 你能夠提供 feed 數據做爲 run() 調用的參數. feed 只在調用它的方法內有效, 方法結束, feed 就會消失. 最多見的用例是將某些特殊的操做指定爲 "feed" 操做, 標記的方法是使用 tf.placeholder() 爲這些操做建立佔位符.

# -*- coding:utf-8 -*-

import tensorflow as tf

if __name__ == "__main__":
    input1 = tf.placeholder(tf.types.float32)
    input2 = tf.placeholder(tf.types.float32)
    output = tf.multiply(input1, input2)

    with tf.Session() as sess:
        print sess.run([output])
        print sess.run([output], feed_dict={input1:[7.], input2:[2.]})

0x7: batch

深度學習的優化算法，說白了就是梯度降低。每次的參數更新有兩種方式

1. 第一種，遍歷所有數據集算一次損失函數，而後算函數對各個參數的梯度，更新梯度。這種方法每更新一次參數都要把數據集裏的全部樣本都看一遍，計算量開銷大，計算速度慢，不支持在線學習，這稱爲Batch gradient descent，批梯度降低。
2. 另外一種，每看一個數據就算一下損失函數，而後求梯度更新參數，這個稱爲隨機梯度降低，stochastic gradient descent。這個方法速度比較快，可是收斂性能不太好，可能在最優勢附近晃來晃去，hit不到最優勢。兩次參數的更新也有可能互相抵消掉，形成目標函數震盪的比較劇烈

爲了克服兩種方法的缺點，如今通常採用的是一種折中手段，mini-batch gradient decent，小批的梯度降低，這種方法把數據分爲若干個批，按批來更新參數，這樣，一個批中的一組數據共同決定了本次梯度的方向，降低起來就不容易跑偏，減小了隨機性。另外一方面由於批的樣本數與整個數據集相比小了不少，計算量也不是很大
基本上如今的梯度降低都是基於mini-batch的，模塊中常常會出現batch_size，就是指這個。
咱們在代碼中常見的優化器SGD是stochastic gradient descent的縮寫，但不表明是一個樣本就更新一回，仍是基於mini-batch的

Relevant Link:

http://www.tensorfly.cn/tfdoc/get_started/os_setup.html
http://keras-cn.readthedocs.io/en/latest/getting_started/concepts/#batch

2. MNIST(multiclass classification)入門

0x1: MNIST數據集

https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/examples/tutorials/mnist/input_data.py#

下載下來的數據集被分紅兩部分：60000行的訓練數據集（mnist.train）和10000行的測試數據集（mnist.test）。這樣的切分很重要，在機器學習模型設計時必須有一個單獨的測試數據集不用於訓練而是用來評估這個模型的性能，從而更加容易把設計的模型推廣到其餘數據集上（泛化）。
正如前面提到的同樣，每個MNIST數據單元有兩部分組成：一張包含手寫數字的圖片和一個對應的標籤(監督學習中，正確打標的樣本特別重要)。咱們把這些圖片設爲「xs」，把這些標籤設爲「ys」。訓練數據集和測試數據集都包含xs和ys，好比訓練數據集的圖片是 mnist.train.images ，訓練數據集的標籤是 mnist.train.labels。
每一張圖片包含28像素X28像素。咱們能夠用一個數字數組來表示這張圖片：

咱們把這個數組展開成一個向量，長度是 28x28 = 784。如何展開這個數組（數字間的順序）不重要，只要保持各個圖片採用相同的方式展開。從這個角度來看，MNIST數據集的圖片就是在784維向量空間裏面的點, 而且擁有比較複雜的結構 (提醒: 此類數據的可視化是計算密集型的)。
展平圖片的數字數組會丟失圖片的二維結構信息。這顯然是不理想的，最優秀的計算機視覺方法會挖掘並利用這些結構信息，但在當前鎖學習的簡單數學模型，softmax迴歸(softmax regression)，不會利用這些結構信息。
所以，在MNIST訓練數據集中，mnist.train.images 是一個形狀爲 [60000, 784] 的張量，第一個維度數字用來索引圖片，第二個維度數字用來索引每張圖片中的像素點。在此張量裏的每個元素，都表示某張圖片裏的某個像素的強度值，值介於0和1之間(黑白圖片)。

相對應的MNIST數據集的標籤是介於0到9的數字，用來描述給定圖片裏表示的數字。爲了用於這個教程，咱們使標籤數據是"one-hot vectors"。一個one-hot向量除了某一位的數字是1之外其他各維度數字都是0。因此在此教程中，數字n將表示成一個只有在第n維度（從0開始）數字爲1的10維向量。好比，標籤0將表示成([1,0,0,0,0,0,0,0,0,0,0])。所以， mnist.train.labels 是一個 [60000, 10] 的數字矩陣。

0x2: Softmax迴歸

咱們知道MNIST的每一張圖片都表示一個數字，從0到9。咱們但願獲得給定圖片表明每一個數字的機率。好比說，咱們的模型可能推測一張包含9的圖片表明數字9的機率是80%可是判斷它是8的機率是5%（由於8和9都有上半部分的小圓），而後給予它表明其餘數字的機率更小的值。
這是一個使用softmax迴歸（softmax regression）模型的經典案例。softmax模型能夠用來給不一樣的對象分配機率。即便在以後，咱們訓練更加精細的模型時，最後一步也須要用softmax來分配機率。
softmax迴歸（softmax regression）分兩步

1. 第一步

爲了獲得一張給定圖片屬於某個特定數字類的證據（evidence），咱們對圖片像素值進行加權求和。若是這個像素具備很強的證聽說明這張圖片不屬於該類，那麼相應的權值爲負數，相反若是這個像素擁有有利的證據支持這張圖片屬於這個類，那麼權值是正數。
下面的圖片顯示了一個模型學習到的圖片上每一個像素對於特定數字類的權值。紅色表明負數權值，藍色表明正數權值。

咱們也須要加入一個額外的偏置量（bias），由於輸入每每會帶有一些無關的干擾量。所以對於給定的輸入圖片 x 它表明的是數字 i 的證據能夠表示爲

其中表明權重，表明數字 i 類的偏置量，j 表明給定圖片 x 的像素索引用於像素求和。而後用softmax函數能夠把這些證據轉換成機率 y：

這裏的softmax能夠當作是一個激勵（activation）函數或者連接（link）函數，把咱們定義的線性函數的輸出轉換成咱們想要的格式，也就是關於10個數字類的機率分佈。所以，給定一張圖片，它對於每個數字的吻合度能夠被softmax函數轉換成爲一個機率值。softmax函數能夠定義爲

展開等式右邊的子式，能夠獲得：

可是更多的時候把softmax模型函數定義爲前一種形式：把輸入值當成冪指數求值，再正則化這些結果值。這個冪運算表示，更大的證據對應更大的假設模型（hypothesis）裏面的乘數權重值。反之，擁有更少的證據意味着在假設模型裏面擁有更小的乘數係數。假設模型裏的權值不能夠是0值或者負值。Softmax而後會正則化這些權重值，使它們的總和等於1，以此構造一個有效的機率分佈。

對於softmax迴歸模型能夠用下面的圖解釋，對於輸入的xs加權求和，再分別加上一個偏置量，最後再輸入到softmax函數中：

若是把它寫成一個等式，咱們能夠獲得：

咱們也能夠用向量表示這個計算過程：用矩陣乘法和向量相加。這有助於提升計算效率。（也是一種更有效的思考方式）

更進一步，能夠寫成更加緊湊的方式

驗證碼識別體現了模式識別的一個很樸素的思想，就是人的認字過程是經歷了一個"學習過程"，在看過了不少人、各類寫法、各類字體的字後，人腦中對某個字應該"長的樣子"造成了一個權重認知模型，無論怎麼潦草，只要基本形態在那裏，人就能認出來。把這個認知過程抽象爲數學概念，本質上就是特定像素區域給與較高的權重，根據像素權重劃分出區域，只要大致在這個區域中，就應該有更大的機率是這個字

0x3: 實現迴歸模型

y = tf.nn.softmax(tf.matmul(x,W) + b)

TensorFlow不只僅可使softmax迴歸模型計算變得特別簡單，它也用這種很是靈活的方式來描述其餘各類數值計算，從機器學習模型對物理學模擬仿真模型。一旦被定義好以後，咱們的模型就能夠在不一樣的設備上運行：計算機的CPU，GPU，甚至是手機

0x4: 訓練模型

爲了訓練咱們的模型，咱們首先須要定義一個指標來評估這個模型是好的。其實，在機器學習，咱們一般定義指標來表示一個模型是壞的，這個指標稱爲成本（cost）或損失（loss），而後儘可能最小化這個指標。可是，這兩種方式是相同的。
一個很是常見的，很是漂亮的成本函數是「交叉熵」（cross-entropy）。交叉熵產生於信息論裏面的信息壓縮編碼技術，可是它後來演變成爲從博弈論到機器學習等其餘領域裏的重要技術手段。它的定義以下：

y 是咱們預測的機率分佈, y' 是實際的分佈（咱們輸入的one-hot vector)。比較粗糙的理解是，交叉熵是用來衡量咱們的預測用於描述真相的低效性。即若是咱們的描述越不許確，則不肯定性就越高，熵值就越大

TensorFlow擁有一張描述你各個計算單元的圖，它能夠自動地使用反向傳播算法(backpropagation algorithm)來有效地肯定你的變量是如何影響你想要最小化的那個成本值的。而後，TensorFlow會用你選擇的優化算法來不斷地修改變量以下降成本。

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

在這裏，咱們要求TensorFlow用梯度降低算法（gradient descent algorithm）以0.01的學習速率最小化交叉熵。梯度降低算法（gradient descent algorithm）是一個簡單的學習過程，TensorFlow只需將每一個變量一點點地往使成本不斷下降的方向移動

TensorFlow在這裏實際上所作的是，它會在後臺給描述你的計算的那張圖裏面增長一系列新的計算操做單元用於實現反向傳播算法和梯度降低算法。而後，它返回給你的只是一個單一的操做，當運行這個操做時，它用梯度降低算法訓練你的模型，微調你的變量，不斷減小成本。

0x5: 評估咱們的模型

首先讓咱們找出那些預測正確的標籤。tf.argmax 是一個很是有用的函數，它能給出某個tensor對象在某一維上的其數據(softmax預測出了一個相似[1,0,0,0,0,0,0,0,0]矩陣，對應爲1的那個就是它預測出的最大機率的數字)最大值所在的索引值(對應的數字)。因爲標籤向量是由0,1組成，所以最大值1所在的索引位置就是類別標籤，好比tf.argmax(y,1)返回的是模型對於任一輸入x預測到的標籤值，而 tf.argmax(y_,1) 表明正確的標籤，咱們能夠用 tf.equal 來檢測咱們的預測是否真實標籤匹配(索引位置同樣表示匹配)

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

這行代碼會給咱們一組布爾值。爲了肯定正確預測項的比例，咱們能夠把布爾值轉換成浮點數，而後取平均值。例如，[True, False, True, True] 會變成 [1,0,1,1] ，取平均值後獲得 0.75.

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

最後，咱們計算所學習到的模型在測試數據集上面的正確率。

print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

0x6: mnist_softmax.py

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""A very simple MNIST classifier.
See extensive documentation at
http://tensorflow.org/tutorials/mnist/beginners/index.md
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import sys

import input_data
import tensorflow as tf

FLAGS = None


def main(_):
  # Import data
  mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

  # Create the model
  x = tf.placeholder(tf.float32, [None, 784])
  W = tf.Variable(tf.zeros([784, 10]))
  b = tf.Variable(tf.zeros([10]))
  y = tf.matmul(x, W) + b

  # Define loss and optimizer
  y_ = tf.placeholder(tf.float32, [None, 10])

  # The raw formulation of cross-entropy,
  #
  #   tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)),
  #                                 reduction_indices=[1]))
  #
  # can be numerically unstable.
  #
  # So here we use tf.nn.softmax_cross_entropy_with_logits on the raw
  # outputs of 'y', and then average across the batch.
  cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

  sess = tf.InteractiveSession()
  tf.global_variables_initializer().run()
  # Train
  for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

  # Test trained model
  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels}))

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--data_dir', type=str, default='MNIST_data/',
                      help='Directory for storing input data')
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Relevant Link:

http://yann.lecun.com/exdb/mnist/
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials/mnist
https://github.com/aymericdamien/TensorFlow-Examples/tree/master/examples
https://www.tensorflow.org/get_started/mnist/pros
https://www.tensorflow.org/get_started/mnist/beginners

3. 深刻MNIST

0x1: 構建一個多層卷積網絡(多層深度神經網絡)

1. 權重初始化

爲了建立這個模型，咱們須要建立大量的權重和偏置項。這個模型中的權重在初始化時應該加入少許的噪聲來打破對稱性以及避免0梯度。因爲咱們使用的是ReLU神經元，所以比較好的作法是用一個較小的正數來初始化偏置項，以免神經元節點輸出恆爲0的問題（dead neurons）。爲了避免在創建模型的時候反覆作初始化操做，咱們定義兩個函數用於初始化

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

2. 卷積和池化(將低維特徵擴展到高維空間)

TensorFlow在卷積和池化上有很強的靈活性。咱們怎麼處理邊界？步長應該設多大？在這個實例裏，咱們會一直使用vanilla版本。咱們的卷積使用1步長（stride size），0邊距（padding size）的模板，保證輸出和輸入是同一個大小。咱們的池化用簡單傳統的2x2大小的模板作max pooling。爲了代碼更簡潔，咱們把這部分抽象成一個函數。

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

3. 第一層卷積

如今咱們能夠開始實現第一層了。它由一個卷積接一個max pooling完成。卷積在每一個5x5的patch中算出32個特徵。卷積的權重張量形狀是[5, 5, 1, 32]，前兩個維度是patch的大小，接着是輸入的通道數目，最後是輸出的通道數目。而對於每個輸出通道都有一個對應的偏置量。

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

爲了用這一層，咱們把x變成一個4d向量，其第二、第3維對應圖片的寬、高，最後一維表明圖片的顏色通道數(由於是灰度圖因此這裏的通道數爲1，若是是rgb彩色圖，則爲3)

x_image = tf.reshape(x, [-1,28,28,1])

We then convolve x_image with the weight tensor, add the bias, apply the ReLU function, and finally max pool. 咱們把x_image和權值向量進行卷積，加上偏置項，而後應用ReLU激活函數，最後進行max pooling。

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

4. 第二層卷積

爲了構建一個更深的網絡，咱們會把幾個相似的層堆疊起來。第二層中，每一個5x5的patch會獲得64個特徵(上一層的輸出是32個特徵，做爲下一層的輸入)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

5. 密集鏈接層

如今，圖片尺寸減少到7x7，咱們加入一個有1024個神經元的全鏈接層，用於處理整個圖片。咱們把池化層輸出的張量reshape成一些向量，乘上權重矩陣，加上偏置，而後對其使用ReLU

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

6. Dropout

爲了減小過擬合，咱們在輸出層以前加入dropout。咱們用一個placeholder來表明一個神經元的輸出在dropout中保持不變的機率。這樣咱們能夠在訓練過程當中啓用dropout，在測試過程當中關閉dropout。 TensorFlow的tf.nn.dropout操做除了能夠屏蔽神經元的輸出外，還會自動處理神經元輸出值的scale。因此用dropout的時候能夠不用考慮scale。

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

7. 輸出層

最後，咱們添加一個softmax層，就像前面的單層softmax regression同樣

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

注意這裏和softmax regression的區別在於，softmax regression的輸入維度是圖像像素的768維，而該網絡的輸入是卷積後的1024高維空間，後者抽象度更好

8. 訓練和評估模型

爲了進行訓練和評估，咱們使用與以前簡單的單層SoftMax神經網絡模型幾乎相同的一套代碼，只是咱們會用更加複雜的ADAM優化器來作梯度最速降低，在feed_dict中加入額外的參數keep_prob來控制dropout比例。而後每100次迭代輸出一第二天志

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run(tf.initialize_all_variables())
for i in range(20000):
  batch = mnist.train.next_batch(50)
  if i%100 == 0:
    train_accuracy = accuracy.eval(feed_dict={
        x:batch[0], y_: batch[1], keep_prob: 1.0})
    print "step %d, training accuracy %g"%(i, train_accuracy)
  train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

9. tensorflow-deep_convolution.py

import input_data

mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)

import tensorflow as tf


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')


def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                          strides=[1, 2, 2, 1], padding='SAME')


sess = tf.InteractiveSession()

x = tf.placeholder("float", shape=[None, 784])
y_ = tf.placeholder("float", shape=[None, 10])

W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))


tf.summary.scalar('Training error', cross_entropy)
tf.summary.scalar('Training accuracy', accuracy)
tf.summary.scalar('sparsity', tf.nn.zero_fraction(h_fc1))

sess.run(tf.global_variables_initializer())


merged_summary_op = tf.summary.merge_all()
print merged_summary_op
summary_writer = tf.summary.FileWriter('./mnist_logs', sess.graph)

for i in range(20000):
    batch = mnist.train.next_batch(50)
    sess.run(train_step, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={
            x: batch[0], y_: batch[1], keep_prob: 1.0})
        print "step %d, training accuracy %g" % (i, train_accuracy)
        summary_str = sess.run(merged_summary_op, feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})
        summary_writer.add_summary(summary_str, i)



print "test accuracy %g" % accuracy.eval(feed_dict={
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

0x2: 前饋神經網絡（feed-forward neural network）：full connected MINST

1. 構建圖表（Build the Graph）

在爲數據建立佔位符以後，就能夠運行mnist.py文件，通過三階段的模式函數操做：inference()， loss()，和training()。圖表就構建完成了

1.inference() —— 儘量地構建好圖表，知足促使神經網絡向前反饋並作出預測的要求
2.loss() —— 往inference圖表中添加生成損失（loss）所須要的操做（ops）
3.training() —— 往損失圖表中添加計算並應用梯度（gradients）所需的操做

推理（Inference）

inference()函數會盡量地構建圖表，作到返回包含了預測結果（output prediction）的Tensor。
它接受圖像佔位符爲輸入，在此基礎上藉助ReLu(Rectified Linear Units)激活函數，構建一對徹底鏈接層（layers），以及一個有着十個節點（node）、指明瞭輸出logtis模型的線性層。
每一層都建立於一個惟一的tf.name_scope之下，建立於該做用域之下的全部元素都將帶有其前綴。

with tf.name_scope('hidden1') as scope:

在定義的做用域中，每一層所使用的權重和誤差都在tf.Variable實例中生成，而且包含了各自指望的shape

weights = tf.Variable(
    tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
                        stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
    name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]),
                     name='biases')

例如，當這些層是在hidden1做用域下生成時，賦予權重變量的獨特名稱將會是"hidden1/weights"。
每一個變量在構建時，都會得到初始化操做（initializer ops）。
在這種最多見的狀況下，經過tf.truncated_normal函數初始化權重變量，給賦予的shape則是一個二維tensor，其中第一個維度表明該層中權重變量所鏈接（connect from）的單元數量，第二個維度表明該層中權重變量所鏈接到的（connect to）單元數量。對於名叫hidden1的第一層，相應的維度則是[IMAGE_PIXELS, hidden1_units](顯然，第一層的輸入是圖像像素維度)，由於權重變量將圖像輸入鏈接到了hidden1層。tf.truncated_normal初始函數將根據所獲得的均值和標準差，生成一個隨機分佈。
而後，經過tf.zeros函數初始化誤差變量（biases），確保全部誤差的起始值都是0，而它們的shape則是其在該層中所接到的（connect to）單元數量。
圖表的三個主要操做，分別是兩個tf.nn.relu操做，它們中嵌入了隱藏層所需的tf.matmul；以及logits模型所需的另一個tf.matmul。三者依次生成，各自的tf.Variable實例則與輸入佔位符或下一層的輸出tensor所鏈接

hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
logits = tf.matmul(hidden2, weights) + biases

最後，程序會返回包含了輸出結果的logitsTensor

損失（Loss）

loss()函數經過添加所需的損失操做，進一步構建圖表。
首先，labels_placeholer中的值，將被編碼爲一個含有1-hot values的Tensor。例如，若是類標識符爲「3」，那麼該值就會被轉換爲：

[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

code

batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(
    concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)

以後，又添加一個tf.nn.softmax_cross_entropy_with_logits操做，用來比較inference()函數與1-hot標籤所輸出的logits Tensor。

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
                                                        onehot_labels,
                                                        name='xentropy')

而後，使用tf.reduce_mean函數，計算batch維度（第一維度）下交叉熵（cross entropy）的平均值，將將該值做爲總損失。

loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

最後，程序會返回包含了損失值的Tensor。
注意：交叉熵是信息理論中的概念，可讓咱們描述若是基於已有事實，相信神經網絡所作的推測最壞會致使什麼結果

訓練

training()函數添加了經過梯度降低（gradient descent）將損失最小化所需的操做。
首先，該函數從loss()函數中獲取損失Tensor，將其交給tf.scalar_summary，後者在與SummaryWriter（見下文）配合使用時，能夠向事件文件（events file）中生成彙總值（summary values）。在實驗中，每次寫入彙總值時，它都會釋放損失Tensor的當前值（snapshot value）

tf.scalar_summary(loss.op.name, loss)

接下來，咱們實例化一個tf.train.GradientDescentOptimizer，負責按照所要求的學習效率（learning rate）應用梯度降低法（gradients）。

optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)

以後，咱們生成一個變量用於保存全局訓練步驟（global training step）的數值，並使用minimize()函數更新系統中的三角權重（triangle weights）、增長全局步驟的操做。根據慣例，這個操做被稱爲 train_op，是TensorFlow會話（session）誘發一個完整訓練步驟所必須運行的操做

global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)

最後，程序返回包含了訓練操做（training op）輸出結果的Tensor

2. 訓練模型

一旦圖表構建完畢，就經過fully_connected_feed.py文件中的用戶代碼進行循環地迭代式訓練和評估

3. 訓練循環

完成會話中變量的初始化以後，就能夠開始訓練了。
訓練的每一步都是經過用戶代碼控制，而能實現有效訓練的最簡單循環就是：

for step in xrange(max_steps):
    sess.run(train_op)

向圖表提供反饋(根據偏差逐級傳遞)

執行每一步時，咱們的代碼會生成一個反饋字典（feed dictionary），其中包含對應步驟中訓練所要使用的例子，這些例子的哈希鍵就是其所表明的佔位符操做。
fill_feed_dict函數會查詢給定的DataSet，索要下一批次batch_size的圖像和標籤，與佔位符相匹配的Tensor則會包含下一批次的圖像和標籤。

images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)

而後，以佔位符爲哈希鍵，建立一個Python字典對象，鍵值則是其表明的反饋Tensor

feed_dict = {
    images_placeholder: images_feed,
    labels_placeholder: labels_feed,
}

這個字典隨後做爲feed_dict參數，傳入sess.run()函數中，爲這一步的訓練提供輸入樣例

這裏簡單理解一下前向反饋

前饋就是信號向前傳遞的意思。BP網絡的前饋表現爲輸入信號從輸入層（輸入層不參加計算）開始，每一層的神經元計算出該層各神經元的輸出並向下一層傳遞直到輸出層計算出網絡的輸出結果，前饋只是用於計算出網絡的輸出，不對網絡的參數進行調整。偏差反向傳播用於訓練時網絡權值和閾值的調整。網絡前向傳播計算出來的結果與實際的結果存在偏差，在離線訓練時，這時網絡採用批量訓練方法計算出整個樣本數據的總偏差，而後從輸出層開始向前推，通常採用梯度降低法逐層求出每一層神經元的閾值和權值的調增量，循環迭代到網絡參數符合要求中止

檢查狀態

在運行sess.run函數時，要在代碼中明確其須要獲取的兩個值：[train_op, loss]

for step in xrange(FLAGS.max_steps):
    feed_dict = fill_feed_dict(data_sets.train,
                               images_placeholder,
                               labels_placeholder)
    _, loss_value = sess.run([train_op, loss],
                             feed_dict=feed_dict)

由於要獲取這兩個值，sess.run()會返回一個有兩個元素的元組。其中每個Tensor對象，對應了返回的元組中的numpy數組，而這些數組中包含了當前這步訓練中對應Tensor的值。因爲train_op並不會產生輸出，其在返回的元祖中的對應元素就是None，因此會被拋棄。可是，若是模型在訓練中出現誤差，loss Tensor的值可能會變成NaN，因此咱們要獲取它的值，並記錄下來。
假設訓練一切正常，沒有出現NaN，訓練循環會每隔100個訓練步驟，就打印一行簡單的狀態文本，告知用戶當前的訓練狀態

if step % 100 == 0:
    print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)

狀態可視化

爲了釋放TensorBoard所使用的事件文件（events file），全部的即時數據（在這裏只有一個）都要在圖表構建階段合併至一個操做（op）中

summary_op = tf.merge_all_summaries()

在建立好會話（session）以後，能夠實例化一個tf.train.SummaryWriter，用於寫入包含了圖表自己和即時數據具體值的事件文件

summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
                                        graph_def=sess.graph_def)

最後，每次運行summary_op時，都會往事件文件中寫入最新的即時數據，函數的輸出會傳入事件文件讀寫器（writer）的add_summary()函數。

summary_str = sess.run(summary_op, feed_dict=feed_dict)
summary_writer.add_summary(summary_str, step)

事件文件寫入完畢以後，能夠就訓練文件夾打開一個TensorBoard，查看即時數據的狀況

保存檢查點（checkpoint）

爲了獲得能夠用來後續恢復模型以進一步訓練或評估的檢查點文件（checkpoint file），咱們實例化一個tf.train.Saver

saver = tf.train.Saver()

在訓練循環中，將按期調用saver.save()方法，向訓練文件夾中寫入包含了當前全部可訓練變量值得檢查點文件

saver.save(sess, FLAGS.train_dir, global_step=step)

這樣，咱們之後就可使用saver.restore()方法，重載模型的參數，繼續訓練

saver.restore(sess, FLAGS.train_dir)

4. 評估模型

每隔一千個訓練步驟，咱們的代碼會嘗試使用訓練數據集與測試數據集，對模型進行評估。do_eval函數會被調用三次，分別使用訓練數據集、驗證數據集合測試數據集

print 'Training Data Eval:'
do_eval(sess,
        eval_correct,
        images_placeholder,
        labels_placeholder,
        data_sets.train)
print 'Validation Data Eval:'
do_eval(sess,
        eval_correct,
        images_placeholder,
        labels_placeholder,
        data_sets.validation)
print 'Test Data Eval:'
do_eval(sess,
        eval_correct,
        images_placeholder,
        labels_placeholder,
        data_sets.test)

5. fully_connected_feed.py

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Trains and Evaluates the MNIST network using a feed dictionary."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# pylint: disable=missing-docstring
import argparse
import os.path
import sys
import time

from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.examples.tutorials.mnist import mnist

# Basic model parameters as external flags.
FLAGS = None


def placeholder_inputs(batch_size):
  """Generate placeholder variables to represent the input tensors.
  These placeholders are used as inputs by the rest of the model building
  code and will be fed from the downloaded data in the .run() loop, below.
  Args:
    batch_size: The batch size will be baked into both placeholders.
  Returns:
    images_placeholder: Images placeholder.
    labels_placeholder: Labels placeholder.
  """
  # Note that the shapes of the placeholders match the shapes of the full
  # image and label tensors, except the first dimension is now batch_size
  # rather than the full size of the train or test data sets.
  images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
                                                         mnist.IMAGE_PIXELS))
  labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))
  return images_placeholder, labels_placeholder


def fill_feed_dict(data_set, images_pl, labels_pl):
  """Fills the feed_dict for training the given step.
  A feed_dict takes the form of:
  feed_dict = {
      <placeholder>: <tensor of values to be passed for placeholder>,
      ....
  }
  Args:
    data_set: The set of images and labels, from input_data.read_data_sets()
    images_pl: The images placeholder, from placeholder_inputs().
    labels_pl: The labels placeholder, from placeholder_inputs().
  Returns:
    feed_dict: The feed dictionary mapping from placeholders to values.
  """
  # Create the feed_dict for the placeholders filled with the next
  # `batch size` examples.
  images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size,
                                                 FLAGS.fake_data)
  feed_dict = {
      images_pl: images_feed,
      labels_pl: labels_feed,
  }
  return feed_dict


def do_eval(sess,
            eval_correct,
            images_placeholder,
            labels_placeholder,
            data_set):
  """Runs one evaluation against the full epoch of data.
  Args:
    sess: The session in which the model has been trained.
    eval_correct: The Tensor that returns the number of correct predictions.
    images_placeholder: The images placeholder.
    labels_placeholder: The labels placeholder.
    data_set: The set of images and labels to evaluate, from
      input_data.read_data_sets().
  """
  # And run one epoch of eval.
  true_count = 0  # Counts the number of correct predictions.
  steps_per_epoch = data_set.num_examples // FLAGS.batch_size
  num_examples = steps_per_epoch * FLAGS.batch_size
  for step in xrange(steps_per_epoch):
    feed_dict = fill_feed_dict(data_set,
                               images_placeholder,
                               labels_placeholder)
    true_count += sess.run(eval_correct, feed_dict=feed_dict)
  precision = float(true_count) / num_examples
  print('  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' %
        (num_examples, true_count, precision))


def run_training():
  """Train MNIST for a number of steps."""
  # Get the sets of images and labels for training, validation, and
  # test on MNIST.
  data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

  # Tell TensorFlow that the model will be built into the default Graph.
  with tf.Graph().as_default():
    # Generate placeholders for the images and labels.
    images_placeholder, labels_placeholder = placeholder_inputs(
        FLAGS.batch_size)

    # Build a Graph that computes predictions from the inference model.
    logits = mnist.inference(images_placeholder,
                             FLAGS.hidden1,
                             FLAGS.hidden2)

    # Add to the Graph the Ops for loss calculation.
    loss = mnist.loss(logits, labels_placeholder)

    # Add to the Graph the Ops that calculate and apply gradients.
    train_op = mnist.training(loss, FLAGS.learning_rate)

    # Add the Op to compare the logits to the labels during evaluation.
    eval_correct = mnist.evaluation(logits, labels_placeholder)

    # Build the summary Tensor based on the TF collection of Summaries.
    summary = tf.summary.merge_all()

    # Add the variable initializer Op.
    init = tf.global_variables_initializer()

    # Create a saver for writing training checkpoints.
    saver = tf.train.Saver()

    # Create a session for running Ops on the Graph.
    sess = tf.Session()

    # Instantiate a SummaryWriter to output summaries and the Graph.
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

    # And then after everything is built:

    # Run the Op to initialize the variables.
    sess.run(init)

    # Start the training loop.
    for step in xrange(FLAGS.max_steps):
      start_time = time.time()

      # Fill a feed dictionary with the actual set of images and labels
      # for this particular training step.
      feed_dict = fill_feed_dict(data_sets.train,
                                 images_placeholder,
                                 labels_placeholder)

      # Run one step of the model.  The return values are the activations
      # from the `train_op` (which is discarded) and the `loss` Op.  To
      # inspect the values of your Ops or variables, you may include them
      # in the list passed to sess.run() and the value tensors will be
      # returned in the tuple from the call.
      _, loss_value = sess.run([train_op, loss],
                               feed_dict=feed_dict)

      duration = time.time() - start_time

      # Write the summaries and print an overview fairly often.
      if step % 100 == 0:
        # Print status to stdout.
        print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
        # Update the events file.
        summary_str = sess.run(summary, feed_dict=feed_dict)
        summary_writer.add_summary(summary_str, step)
        summary_writer.flush()

      # Save a checkpoint and evaluate the model periodically.
      if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
        checkpoint_file = os.path.join(FLAGS.log_dir, 'model.ckpt')
        saver.save(sess, checkpoint_file, global_step=step)
        # Evaluate against the training set.
        print('Training Data Eval:')
        do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.train)
        # Evaluate against the validation set.
        print('Validation Data Eval:')
        do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.validation)
        # Evaluate against the test set.
        print('Test Data Eval:')
        do_eval(sess,
                eval_correct,
                images_placeholder,
                labels_placeholder,
                data_sets.test)


def main(_):
  if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)
  tf.gfile.MakeDirs(FLAGS.log_dir)
  run_training()


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--learning_rate',
      type=float,
      default=0.01,
      help='Initial learning rate.'
  )
  parser.add_argument(
      '--max_steps',
      type=int,
      default=20000,
      help='Number of steps to run trainer.'
  )
  parser.add_argument(
      '--hidden1',
      type=int,
      default=128,
      help='Number of units in hidden layer 1.'
  )
  parser.add_argument(
      '--hidden2',
      type=int,
      default=32,
      help='Number of units in hidden layer 2.'
  )
  parser.add_argument(
      '--batch_size',
      type=int,
      default=100,
      help='Batch size.  Must divide evenly into the dataset sizes.'
  )
  parser.add_argument(
      '--input_data_dir',
      type=str,
      default='MNIST_data/',
      help='Directory to put the input data.'
  )
  parser.add_argument(
      '--log_dir',
      type=str,
      default='./mnist_logs',
      help='Directory to put the log data.'
  )
  parser.add_argument(
      '--fake_data',
      default=False,
      help='If true, uses fake data for unit testing.',
      action='store_true'
  )

  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Relevant Link:

http://www.tensorfly.cn/tfdoc/tutorials/mnist_pros.html
http://www.tensorfly.cn/tfdoc/tutorials/mnist_tf.html

4. 卷積神經網絡：CIFAR-10 數據集分類（將像素空間通卷積擴展到高維空間，輸入CNN進行計算）

對CIFAR-10 數據集的分類是機器學習中一個公開的基準測試問題，其任務是對一組32x32RGB的圖像進行分類，這些圖像涵蓋了10個類別：

飛機， 汽車， 鳥， 貓， 鹿， 狗， 青蛙， 馬， 船以及卡車

0x1: 模型架構

本教程中的模型是一個多層架構，由卷積層和非線性層(nonlinearities)交替屢次排列後構成。這些層最終經過全連通層對接到softmax分類器上

1. 模型輸入

輸入模型是經過 inputs() 和distorted_inputs()函數創建起來的，這2個函數會從CIFAR-10二進制文件中讀取圖片文件，因爲每一個圖片的存儲字節數是固定的，所以可使用tf.FixedLengthRecordReader函數

圖片文件的處理流程以下

圖片會被統一裁剪到24x24像素大小，裁剪中央區域用於評估或隨機裁剪用於訓練；
圖片會進行近似的白化處理，使得模型對圖片的動態範圍變化不敏感(讓識別模型對圖像的亮度等因素不敏感)

對於訓練，咱們另外採起了一系列隨機變換的方法來人爲的增長數據集的大小

對圖像進行隨機的左右翻轉；
隨機變換圖像的亮度；
隨機變換圖像的對比度；

2. 模型預測

模型的預測流程由inference()構造，該函數會添加必要的操做步驟用於計算預測值的 logits，其對應的模型組織方式以下所示：

conv1    實現卷積 以及 rectified linear activation.
pool1    max pooling.
norm1    局部響應歸一化.
conv2    卷積 and rectified linear activation.
norm2    局部響應歸一化.
pool2    max pooling.
local3    基於修正線性激活的全鏈接層.
local4    基於修正線性激活的全鏈接層.
softmax_linear    進行線性變換以輸出 logits.

0x2: 模型訓練

訓練一個可進行N維分類的網絡的經常使用方法是使用多項式邏輯迴歸,又被叫作softmax 迴歸。Softmax 迴歸在網絡的輸出層上附加了一個softmax nonlinearity，而且計算歸一化的預測值和label的1-hot encoding的交叉熵。在正則化過程當中，咱們會對全部學習變量應用權重衰減損失(和手寫文字識別相似，圖像識別的本質就是對應某個形狀的物理對應該區域的權重相應較高，這也是人識別圖像甚至畸形圖像的本質道理)。模型的目標函數是求交叉熵損失和全部權重衰減項的和，loss()函數的返回值就是這個值。

train() 函數會添加一些操做使得目標函數最小化，這些操做包括計算梯度、更新學習變量(GradientDescentOptimizer)。train() 函數最終會返回一個用以對一批圖像執行全部計算的操做步驟，以便訓練並更新模型。

0x3: 開始執行並訓練模型

cifar10_train.py輸出的終端信息中提供了關於模型如何訓練的一些信息，好比

損失是真的在減少仍是看到的只是噪聲數據？
爲模型提供的圖片是否合適？
梯度、激活、權重的值是否合理？
當前的學習率是多少？

相比於總損失，在訓練過程當中的單項損失尤爲值得人們的注意。可是因爲訓練中使用的數據批量比較小，損失值中夾雜了至關多的噪聲。在實踐過程當中，咱們也發現相比於原始值，損失值的移動平均值顯得更爲有意義

cifar10.py

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Builds the CIFAR-10 network.

Summary of available functions:

 # Compute input images and labels for training. If you would like to run
 # evaluations, use inputs() instead.
 inputs, labels = distorted_inputs()

 # Compute inference on the model inputs to make a prediction.
 predictions = inference(inputs)

 # Compute the total loss of the prediction with respect to the labels.
 loss = loss(predictions, labels)

 # Create a graph to run one step of training with respect to the loss.
 train_op = train(loss, global_step)
"""
# pylint: disable=missing-docstring
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import gzip
import os
import re
import sys
import tarfile

from six.moves import urllib
import tensorflow as tf

import cifar10_input

FLAGS = tf.app.flags.FLAGS

# Basic model parameters.
tf.app.flags.DEFINE_integer('batch_size', 128,
                            """Number of images to process in a batch.""")
tf.app.flags.DEFINE_string('data_dir', './cifar10_data',
                           """Path to the CIFAR-10 data directory.""")

# Global constants describing the CIFAR-10 data set.
IMAGE_SIZE = cifar10_input.IMAGE_SIZE
NUM_CLASSES = cifar10_input.NUM_CLASSES
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL


# Constants describing the training process.
MOVING_AVERAGE_DECAY = 0.9999     # The decay to use for the moving average.
NUM_EPOCHS_PER_DECAY = 350.0      # Epochs after which learning rate decays.
LEARNING_RATE_DECAY_FACTOR = 0.1  # Learning rate decay factor.
INITIAL_LEARNING_RATE = 0.1       # Initial learning rate.

# If a model is trained with multiple GPU's prefix all Op names with tower_name
# to differentiate the operations. Note that this prefix is removed from the
# names of the summaries when visualizing a model.
TOWER_NAME = 'tower'

DATA_URL = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'


def _activation_summary(x):
  """Helper to create summaries for activations.

  Creates a summary that provides a histogram of activations.
  Creates a summary that measure the sparsity of activations.

  Args:
    x: Tensor
  Returns:
    nothing
  """
  # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
  # session. This helps the clarity of presentation on tensorboard.
  tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name)
  tf.summary.histogram(tensor_name + '/activations', x)
  tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x))


def _variable_on_cpu(name, shape, initializer):
  """Helper to create a Variable stored on CPU memory.

  Args:
    name: name of the variable
    shape: list of ints
    initializer: initializer for Variable

  Returns:
    Variable Tensor
  """
  with tf.device('/cpu:0'):
    var = tf.get_variable(name, shape, initializer=initializer)
  return var


def _variable_with_weight_decay(name, shape, stddev, wd):
  """Helper to create an initialized Variable with weight decay.

  Note that the Variable is initialized with a truncated normal distribution.
  A weight decay is added only if one is specified.

  Args:
    name: name of the variable
    shape: list of ints
    stddev: standard deviation of a truncated Gaussian
    wd: add L2Loss weight decay multiplied by this float. If None, weight
        decay is not added for this Variable.

  Returns:
    Variable Tensor
  """
  var = _variable_on_cpu(name, shape,
                         tf.truncated_normal_initializer(stddev=stddev))
  if wd:
    weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss')
    tf.add_to_collection('losses', weight_decay)
  return var


def distorted_inputs():
  """Construct distorted input for CIFAR training using the Reader ops.

  Returns:
    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.

  Raises:
    ValueError: If no data_dir
  """
  if not FLAGS.data_dir:
    raise ValueError('Please supply a data_dir')
  data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
  return cifar10_input.distorted_inputs(data_dir=data_dir,
                                        batch_size=FLAGS.batch_size)


def inputs(eval_data):
  """Construct input for CIFAR evaluation using the Reader ops.

  Args:
    eval_data: bool, indicating if one should use the train or eval data set.

  Returns:
    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.

  Raises:
    ValueError: If no data_dir
  """
  if not FLAGS.data_dir:
    raise ValueError('Please supply a data_dir')
  data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
  return cifar10_input.inputs(eval_data=eval_data, data_dir=data_dir,
                              batch_size=FLAGS.batch_size)


def inference(images):
  """Build the CIFAR-10 model.

  Args:
    images: Images returned from distorted_inputs() or inputs().

  Returns:
    Logits.
  """
  # We instantiate all variables using tf.get_variable() instead of
  # tf.Variable() in order to share variables across multiple GPU training runs.
  # If we only ran this model on a single GPU, we could simplify this function
  # by replacing all instances of tf.get_variable() with tf.Variable().
  #
  # conv1
  with tf.variable_scope('conv1') as scope:
    kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
                                         stddev=1e-4, wd=0.0)
    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
    bias = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(bias, name=scope.name)
    _activation_summary(conv1)

  # pool1
  pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                         padding='SAME', name='pool1')
  # norm1
  norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm1')

  # conv2
  with tf.variable_scope('conv2') as scope:
    kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 64],
                                         stddev=1e-4, wd=0.0)
    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
    bias = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(bias, name=scope.name)
    _activation_summary(conv2)

  # norm2
  norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm2')
  # pool2
  pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1], padding='SAME', name='pool2')

  # local3
  with tf.variable_scope('local3') as scope:
    # Move everything into depth so we can perform a single matrix multiply.
    dim = 1
    for d in pool2.get_shape()[1:].as_list():
      dim *= d
    reshape = tf.reshape(pool2, [FLAGS.batch_size, dim])

    weights = _variable_with_weight_decay('weights', shape=[dim, 384],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
    _activation_summary(local3)

  # local4
  with tf.variable_scope('local4') as scope:
    weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
    _activation_summary(local4)

  # softmax, i.e. softmax(WX + b)
  with tf.variable_scope('softmax_linear') as scope:
    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
                                          stddev=1/192.0, wd=0.0)
    biases = _variable_on_cpu('biases', [NUM_CLASSES],
                              tf.constant_initializer(0.0))
    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
    _activation_summary(softmax_linear)

  return softmax_linear


def loss(logits, labels):
  """Add L2Loss to all the trainable variables.

  Add summary for for "Loss" and "Loss/avg".
  Args:
    logits: Logits from inference().
    labels: Labels from distorted_inputs or inputs(). 1-D tensor
            of shape [batch_size]

  Returns:
    Loss tensor of type float.
  """
  # Calculate the average cross entropy loss across the batch.
  labels = tf.cast(labels, tf.int64)
  cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
      logits=logits, labels=labels, name='cross_entropy_per_example')
  cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
  tf.add_to_collection('losses', cross_entropy_mean)

  # The total loss is defined as the cross entropy loss plus all of the weight
  # decay terms (L2 loss).
  return tf.add_n(tf.get_collection('losses'), name='total_loss')


def _add_loss_summaries(total_loss):
  """Add summaries for losses in CIFAR-10 model.

  Generates moving average for all losses and associated summaries for
  visualizing the performance of the network.

  Args:
    total_loss: Total loss from loss().
  Returns:
    loss_averages_op: op for generating moving averages of losses.
  """
  # Compute the moving average of all individual losses and the total loss.
  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
  losses = tf.get_collection('losses')
  loss_averages_op = loss_averages.apply(losses + [total_loss])

  # Attach a scalar summary to all individual losses and the total loss; do the
  # same for the averaged version of the losses.
  for l in losses + [total_loss]:
    # Name each loss as '(raw)' and name the moving average version of the loss
    # as the original loss name.
    tf.summary.scalar(l.op.name +' (raw)', l)
    tf.summary.scalar(l.op.name, loss_averages.average(l))

  return loss_averages_op


def train(total_loss, global_step):
  """Train CIFAR-10 model.

  Create an optimizer and apply to all trainable variables. Add moving
  average for all trainable variables.

  Args:
    total_loss: Total loss from loss().
    global_step: Integer Variable counting the number of training steps
      processed.
  Returns:
    train_op: op for training.
  """
  # Variables that affect learning rate.
  num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size
  decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)

  # Decay the learning rate exponentially based on the number of steps.
  lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
                                  global_step,
                                  decay_steps,
                                  LEARNING_RATE_DECAY_FACTOR,
                                  staircase=True)
  tf.summary.scalar('learning_rate', lr)

  # Generate moving averages of all losses and associated summaries.
  loss_averages_op = _add_loss_summaries(total_loss)

  # Compute gradients.
  with tf.control_dependencies([loss_averages_op]):
    opt = tf.train.GradientDescentOptimizer(lr)
    grads = opt.compute_gradients(total_loss)

  # Apply gradients.
  apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)

  # Add histograms for trainable variables.
  for var in tf.trainable_variables():
      tf.summary.histogram(var.op.name, var)

  # Add histograms for gradients.
  for grad, var in grads:
    if grad is not None:
        tf.summary.histogram(var.op.name + '/gradients', grad)

  # Track the moving averages of all trainable variables.
  variable_averages = tf.train.ExponentialMovingAverage(
      MOVING_AVERAGE_DECAY, global_step)
  variables_averages_op = variable_averages.apply(tf.trainable_variables())

  with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
    train_op = tf.no_op(name='train')

  return train_op


def maybe_download_and_extract():
  """Download and extract the tarball from Alex's website."""
  dest_directory = FLAGS.data_dir
  if not os.path.exists(dest_directory):
    os.makedirs(dest_directory)
  filename = DATA_URL.split('/')[-1]
  filepath = os.path.join(dest_directory, filename)
  if not os.path.exists(filepath):
    def _progress(count, block_size, total_size):
      sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,
          float(count * block_size) / float(total_size) * 100.0))
      sys.stdout.flush()
    filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath,
                                             reporthook=_progress)
    print()
    statinfo = os.stat(filepath)
    print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
    tarfile.open(filepath, 'r:gz').extractall(dest_directory)

cifar10_train.py

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""A binary to train CIFAR-10 using a single GPU.

Accuracy:
cifar10_train.py achieves ~86% accuracy after 100K steps (256 epochs of
data) as judged by cifar10_eval.py.

Speed: With batch_size 128.

System        | Step Time (sec/batch)  |     Accuracy
------------------------------------------------------------------
1 Tesla K20m  | 0.35-0.60              | ~86% at 60K steps  (5 hours)
1 Tesla K40m  | 0.25-0.35              | ~86% at 100K steps (4 hours)

Usage:
Please see the tutorial and website for how to download the CIFAR-10
data set, compile the program and train the model.

http://tensorflow.org/tutorials/deep_cnn/
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from datetime import datetime
import os.path
import time

import numpy as np
from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

import cifar10

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('train_dir', './cifar10_train',
                           """Directory where to write event logs """
                           """and checkpoint.""")
tf.app.flags.DEFINE_integer('max_steps', 1000000,
                            """Number of batches to run.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
                            """Whether to log device placement.""")


def train():
  """Train CIFAR-10 for a number of steps."""
  with tf.Graph().as_default():
    global_step = tf.Variable(0, trainable=False)

    # Get images and labels for CIFAR-10.
    images, labels = cifar10.distorted_inputs()

    # Build a Graph that computes the logits predictions from the
    # inference model.
    logits = cifar10.inference(images)

    # Calculate loss.
    loss = cifar10.loss(logits, labels)

    # Build a Graph that trains the model with one batch of examples and
    # updates the model parameters.
    train_op = cifar10.train(loss, global_step)

    # Create a saver.
    saver = tf.train.Saver(tf.global_variables())

    # Build the summary operation based on the TF collection of Summaries.
    summary_op = tf.summary.merge_all()

    # Build an initialization operation to run below.
    init = tf.global_variables_initializer()

    # Start running operations on the Graph.
    sess = tf.Session(config=tf.ConfigProto(
        log_device_placement=FLAGS.log_device_placement))
    sess.run(init)

    # Start the queue runners.
    tf.train.start_queue_runners(sess=sess)

    summary_writer = tf.summary.FileWriter(FLAGS.train_dir,
                                            graph=sess.graph)

    for step in xrange(FLAGS.max_steps):
      start_time = time.time()
      _, loss_value = sess.run([train_op, loss])
      duration = time.time() - start_time

      assert not np.isnan(loss_value), 'Model diverged with loss = NaN'

      if step % 10 == 0:
        num_examples_per_step = FLAGS.batch_size
        examples_per_sec = num_examples_per_step / duration
        sec_per_batch = float(duration)

        format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
                      'sec/batch)')
        print (format_str % (datetime.now(), step, loss_value,
                             examples_per_sec, sec_per_batch))

      if step % 100 == 0:
        summary_str = sess.run(summary_op)
        summary_writer.add_summary(summary_str, step)

      # Save the model checkpoint periodically.
      if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
        checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')
        saver.save(sess, checkpoint_path, global_step=step)


def main(argv=None):  # pylint: disable=unused-argument
  cifar10.maybe_download_and_extract()
  if tf.gfile.Exists(FLAGS.train_dir):
    tf.gfile.DeleteRecursively(FLAGS.train_dir)
  tf.gfile.MakeDirs(FLAGS.train_dir)
  train()


if __name__ == '__main__':
  tf.app.run()

cifar10_input.py

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""Routine for decoding the CIFAR-10 binary file format."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

# Process images of this size. Note that this differs from the original CIFAR
# image size of 32 x 32. If one alters this number, then the entire model
# architecture will change and any model would need to be retrained.
IMAGE_SIZE = 24

# Global constants describing the CIFAR-10 data set.
NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000


def read_cifar10(filename_queue):
  """Reads and parses examples from CIFAR10 data files.

  Recommendation: if you want N-way read parallelism, call this function
  N times.  This will give you N independent Readers reading different
  files & positions within those files, which will give better mixing of
  examples.

  Args:
    filename_queue: A queue of strings with the filenames to read from.

  Returns:
    An object representing a single example, with the following fields:
      height: number of rows in the result (32)
      width: number of columns in the result (32)
      depth: number of color channels in the result (3)
      key: a scalar string Tensor describing the filename & record number
        for this example.
      label: an int32 Tensor with the label in the range 0..9.
      uint8image: a [height, width, depth] uint8 Tensor with the image data
  """

  class CIFAR10Record(object):
    pass
  result = CIFAR10Record()

  # Dimensions of the images in the CIFAR-10 dataset.
  # See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
  # input format.
  label_bytes = 1  # 2 for CIFAR-100
  result.height = 32
  result.width = 32
  result.depth = 3
  image_bytes = result.height * result.width * result.depth
  # Every record consists of a label followed by the image, with a
  # fixed number of bytes for each.
  record_bytes = label_bytes + image_bytes

  # Read a record, getting filenames from the filename_queue.  No
  # header or footer in the CIFAR-10 format, so we leave header_bytes
  # and footer_bytes at their default of 0.
  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
  result.key, value = reader.read(filename_queue)

  # Convert from a string to a vector of uint8 that is record_bytes long.
  record_bytes = tf.decode_raw(value, tf.uint8)

  # The first bytes represent the label, which we convert from uint8->int32.
  result.label = tf.cast(
      tf.slice(record_bytes, [0], [label_bytes]), tf.int32)

  # The remaining bytes after the label represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
                           [result.depth, result.height, result.width])
  # Convert from [depth, height, width] to [height, width, depth].
  result.uint8image = tf.transpose(depth_major, [1, 2, 0])

  return result


def _generate_image_and_label_batch(image, label, min_queue_examples,
                                    batch_size):
  """Construct a queued batch of images and labels.

  Args:
    image: 3-D Tensor of [height, width, 3] of type.float32.
    label: 1-D Tensor of type.int32
    min_queue_examples: int32, minimum number of samples to retain
      in the queue that provides of batches of examples.
    batch_size: Number of images per batch.

  Returns:
    images: Images. 4D tensor of [batch_size, height, width, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.
  """
  # Create a queue that shuffles the examples, and then
  # read 'batch_size' images + labels from the example queue.
  num_preprocess_threads = 16
  images, label_batch = tf.train.shuffle_batch(
      [image, label],
      batch_size=batch_size,
      num_threads=num_preprocess_threads,
      capacity=min_queue_examples + 3 * batch_size,
      min_after_dequeue=min_queue_examples)

  # Display the training images in the visualizer.
  tf.summary.image('images', images)

  return images, tf.reshape(label_batch, [batch_size])


def distorted_inputs(data_dir, batch_size):
  """Construct distorted input for CIFAR training using the Reader ops.

  Args:
    data_dir: Path to the CIFAR-10 data directory.
    batch_size: Number of images per batch.

  Returns:
    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.
  """
  filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
               for i in xrange(1, 6)]
  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)

  # Create a queue that produces the filenames to read.
  filename_queue = tf.train.string_input_producer(filenames)

  # Read examples from files in the filename queue.
  read_input = read_cifar10(filename_queue)
  reshaped_image = tf.cast(read_input.uint8image, tf.float32)

  height = IMAGE_SIZE
  width = IMAGE_SIZE

  # Image processing for training the network. Note the many random
  # distortions applied to the image.

  # Randomly crop a [height, width] section of the image.
  distorted_image = tf.random_crop(reshaped_image, [height, width, 3])

  # Randomly flip the image horizontally.
  distorted_image = tf.image.random_flip_left_right(distorted_image)

  # Because these operations are not commutative, consider randomizing
  # randomize the order their operation.
  distorted_image = tf.image.random_brightness(distorted_image,
                                               max_delta=63)
  distorted_image = tf.image.random_contrast(distorted_image,
                                             lower=0.2, upper=1.8)

  # Subtract off the mean and divide by the variance of the pixels.
  float_image = tf.image.per_image_standardization(distorted_image)

  # Ensure that the random shuffling has good mixing properties.
  min_fraction_of_examples_in_queue = 0.4
  min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                           min_fraction_of_examples_in_queue)
  print ('Filling queue with %d CIFAR images before starting to train. '
         'This will take a few minutes.' % min_queue_examples)

  # Generate a batch of images and labels by building up a queue of examples.
  return _generate_image_and_label_batch(float_image, read_input.label,
                                         min_queue_examples, batch_size)


def inputs(eval_data, data_dir, batch_size):
  """Construct input for CIFAR evaluation using the Reader ops.

  Args:
    eval_data: bool, indicating if one should use the train or eval data set.
    data_dir: Path to the CIFAR-10 data directory.
    batch_size: Number of images per batch.

  Returns:
    images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
    labels: Labels. 1D tensor of [batch_size] size.
  """
  if not eval_data:
    filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
                 for i in xrange(1, 6)]
    num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
  else:
    filenames = [os.path.join(data_dir, 'test_batch.bin')]
    num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL

  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)

  # Create a queue that produces the filenames to read.
  filename_queue = tf.train.string_input_producer(filenames)

  # Read examples from files in the filename queue.
  read_input = read_cifar10(filename_queue)
  reshaped_image = tf.cast(read_input.uint8image, tf.float32)

  height = IMAGE_SIZE
  width = IMAGE_SIZE

  # Image processing for evaluation.
  # Crop the central [height, width] of the image.
  resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
                                                         width, height)

  # Subtract off the mean and divide by the variance of the pixels.
  float_image = tf.image.per_image_whitening(resized_image)

  # Ensure that the random shuffling has good mixing properties.
  min_fraction_of_examples_in_queue = 0.4
  min_queue_examples = int(num_examples_per_epoch *
                           min_fraction_of_examples_in_queue)

  # Generate a batch of images and labels by building up a queue of examples.
  return _generate_image_and_label_batch(float_image, read_input.label,
                                         min_queue_examples, batch_size)

0x4: 評估模型

如今能夠在另外一部分數據集上來評估訓練模型的性能。腳本文件cifar10_eval.py對模型進行了評估，利用 inference()函數重構模型，並使用了在評估數據集全部10,000張CIFAR-10圖片進行測試。最終計算出的精度爲1:N，N=預測值中置信度最高的一項與圖片真實label匹配的頻次。(It calculates the precision at 1: how often the top prediction matches the true label of the image)。
爲了監控模型在訓練過程當中的改進狀況，評估用的腳本文件會週期性的在最新的檢查點文件上運行，這些檢查點文件是由cifar10_train.py產生。

cifar10_eval.py

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Evaluation for CIFAR-10.
Accuracy:
cifar10_train.py achieves 83.0% accuracy after 100K steps (256 epochs
of data) as judged by cifar10_eval.py.
Speed:
On a single Tesla K40, cifar10_train.py processes a single batch of 128 images
in 0.25-0.35 sec (i.e. 350 - 600 images /sec). The model reaches ~86%
accuracy after 100K steps in 8 hours of training time.
Usage:
Please see the tutorial and website for how to download the CIFAR-10
data set, compile the program and train the model.
http://tensorflow.org/tutorials/deep_cnn/
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from datetime import datetime
import math
import time
import tensorflow.python.platform
from tensorflow.python.platform import gfile
import numpy as np
import tensorflow as tf
import cifar10
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('eval_dir', './cifar10_eval',
                           """Directory where to write event logs.""")
tf.app.flags.DEFINE_string('eval_data', 'test',
                           """Either 'test' or 'train_eval'.""")
tf.app.flags.DEFINE_string('checkpoint_dir', './cifar10_train',
                           """Directory where to read model checkpoints.""")
tf.app.flags.DEFINE_integer('eval_interval_secs', 60 * 5,
                            """How often to run the eval.""")
tf.app.flags.DEFINE_integer('num_examples', 10000,
                            """Number of examples to run.""")
tf.app.flags.DEFINE_boolean('run_once', False,
                         """Whether to run eval only once.""")
def eval_once(saver, summary_writer, top_k_op, summary_op):
  """Run Eval once.
  Args:
    saver: Saver.
    summary_writer: Summary writer.
    top_k_op: Top K op.
    summary_op: Summary op.
  """
  with tf.Session() as sess:
    ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
    if ckpt and ckpt.model_checkpoint_path:
      # Restores from checkpoint
      saver.restore(sess, ckpt.model_checkpoint_path)
      # Assuming model_checkpoint_path looks something like:
      #   /my-favorite-path/cifar10_train/model.ckpt-0,
      # extract global_step from it.
      global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
    else:
      print('No checkpoint file found')
      return
    # Start the queue runners.
    coord = tf.train.Coordinator()
    try:
      threads = []
      for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
        threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
                                         start=True))
      num_iter = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size))
      true_count = 0  # Counts the number of correct predictions.
      total_sample_count = num_iter * FLAGS.batch_size
      step = 0
      while step < num_iter and not coord.should_stop():
        predictions = sess.run([top_k_op])
        true_count += np.sum(predictions)
        step += 1
      # Compute precision @ 1.
      precision = true_count / total_sample_count
      print('%s: precision @ 1 = %.3f' % (datetime.now(), precision))
      summary = tf.Summary()
      summary.ParseFromString(sess.run(summary_op))
      summary.value.add(tag='Precision @ 1', simple_value=precision)
      summary_writer.add_summary(summary, global_step)
    except Exception as e:  # pylint: disable=broad-except
      coord.request_stop(e)
    coord.request_stop()
    coord.join(threads, stop_grace_period_secs=10)
def evaluate():
  """Eval CIFAR-10 for a number of steps."""
  with tf.Graph().as_default():
    # Get images and labels for CIFAR-10.
    eval_data = FLAGS.eval_data == 'test'
    images, labels = cifar10.inputs(eval_data=eval_data)
    # Build a Graph that computes the logits predictions from the
    # inference model.
    logits = cifar10.inference(images)
    # Calculate predictions.
    top_k_op = tf.nn.in_top_k(logits, labels, 1)
    # Restore the moving average version of the learned variables for eval.
    variable_averages = tf.train.ExponentialMovingAverage(
        cifar10.MOVING_AVERAGE_DECAY)
    variables_to_restore = variable_averages.variables_to_restore()
    saver = tf.train.Saver(variables_to_restore)
    # Build the summary operation based on the TF collection of Summaries.
    summary_op = tf.summary.merge_all()
    graph = tf.get_default_graph().as_graph_def()
    summary_writer = tf.summary.FileWriter(FLAGS.eval_dir,
                                            graph=graph)
    while True:
      eval_once(saver, summary_writer, top_k_op, summary_op)
      if FLAGS.run_once:
        break
      time.sleep(FLAGS.eval_interval_secs)
def main(argv=None):  # pylint: disable=unused-argument
  cifar10.maybe_download_and_extract()
  if gfile.Exists(FLAGS.eval_dir):
    gfile.DeleteRecursively(FLAGS.eval_dir)
  gfile.MakeDirs(FLAGS.eval_dir)
  evaluate()
if __name__ == '__main__':
  tf.app.run()

google的tensorflow api在1.0正式版本後變化很大，舊的代碼在遷移到1.0後須要修改對應的api名字

0x5: 在GPU上運行

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""A binary to train CIFAR-10 using multiple GPU's with synchronous updates.

Accuracy:
cifar10_multi_gpu_train.py achieves ~86% accuracy after 100K steps (256
epochs of data) as judged by cifar10_eval.py.

Speed: With batch_size 128.

System        | Step Time (sec/batch)  |     Accuracy
--------------------------------------------------------------------
1 Tesla K20m  | 0.35-0.60              | ~86% at 60K steps  (5 hours)
1 Tesla K40m  | 0.25-0.35              | ~86% at 100K steps (4 hours)
2 Tesla K20m  | 0.13-0.20              | ~84% at 30K steps  (2.5 hours)
3 Tesla K20m  | 0.13-0.18              | ~84% at 30K steps
4 Tesla K20m  | ~0.10                  | ~84% at 30K steps

Usage:
Please see the tutorial and website for how to download the CIFAR-10
data set, compile the program and train the model.

http://tensorflow.org/tutorials/deep_cnn/
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from datetime import datetime
import os.path
import re
import time

import numpy as np
from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf
import cifar10

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('train_dir', './cifar10_train',
                           """Directory where to write event logs """
                           """and checkpoint.""")
tf.app.flags.DEFINE_integer('max_steps', 1000000,
                            """Number of batches to run.""")
tf.app.flags.DEFINE_integer('num_gpus', 1,
                            """How many GPUs to use.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
                            """Whether to log device placement.""")


def tower_loss(scope):
  """Calculate the total loss on a single tower running the CIFAR model.

  Args:
    scope: unique prefix string identifying the CIFAR tower, e.g. 'tower_0'

  Returns:
     Tensor of shape [] containing the total loss for a batch of data
  """
  # Get images and labels for CIFAR-10.
  images, labels = cifar10.distorted_inputs()

  # Build inference Graph.
  logits = cifar10.inference(images)

  # Build the portion of the Graph calculating the losses. Note that we will
  # assemble the total_loss using a custom function below.
  _ = cifar10.loss(logits, labels)

  # Assemble all of the losses for the current tower only.
  losses = tf.get_collection('losses', scope)

  # Calculate the total loss for the current tower.
  total_loss = tf.add_n(losses, name='total_loss')

  # Compute the moving average of all individual losses and the total loss.
  loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
  loss_averages_op = loss_averages.apply(losses + [total_loss])

  # Attach a scalar summary to all individual losses and the total loss; do the
  # same for the averaged version of the losses.
  for l in losses + [total_loss]:
    # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
    # session. This helps the clarity of presentation on tensorboard.
    loss_name = re.sub('%s_[0-9]*/' % cifar10.TOWER_NAME, '', l.op.name)
    # Name each loss as '(raw)' and name the moving average version of the loss
    # as the original loss name.
    tf.summary.scalar(loss_name +' (raw)', l)
    tf.summary.scalar(loss_name, loss_averages.average(l))

  with tf.control_dependencies([loss_averages_op]):
    total_loss = tf.identity(total_loss)
  return total_loss


def average_gradients(tower_grads):
  """Calculate the average gradient for each shared variable across all towers.

  Note that this function provides a synchronization point across all towers.

  Args:
    tower_grads: List of lists of (gradient, variable) tuples. The outer list
      is over individual gradients. The inner list is over the gradient
      calculation for each tower.
  Returns:
     List of pairs of (gradient, variable) where the gradient has been averaged
     across all towers.
  """
  average_grads = []
  for grad_and_vars in zip(*tower_grads):
    # Note that each grad_and_vars looks like the following:
    #   ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
    grads = []
    for g, _ in grad_and_vars:
      # Add 0 dimension to the gradients to represent the tower.
      expanded_g = tf.expand_dims(g, 0)

      # Append on a 'tower' dimension which we will average over below.
      grads.append(expanded_g)

    # Average over the 'tower' dimension.
    grad = tf.concat(0, grads)
    grad = tf.reduce_mean(grad, 0)

    # Keep in mind that the Variables are redundant because they are shared
    # across towers. So .. we will just return the first tower's pointer to
    # the Variable.
    v = grad_and_vars[0][1]
    grad_and_var = (grad, v)
    average_grads.append(grad_and_var)
  return average_grads


def train():
  """Train CIFAR-10 for a number of steps."""
  with tf.Graph().as_default(), tf.device('/cpu:0'):
    # Create a variable to count the number of train() calls. This equals the
    # number of batches processed * FLAGS.num_gpus.
    global_step = tf.get_variable(
        'global_step', [],
        initializer=tf.constant_initializer(0), trainable=False)

    # Calculate the learning rate schedule.
    num_batches_per_epoch = (cifar10.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /
                             FLAGS.batch_size)
    decay_steps = int(num_batches_per_epoch * cifar10.NUM_EPOCHS_PER_DECAY)

    # Decay the learning rate exponentially based on the number of steps.
    lr = tf.train.exponential_decay(cifar10.INITIAL_LEARNING_RATE,
                                    global_step,
                                    decay_steps,
                                    cifar10.LEARNING_RATE_DECAY_FACTOR,
                                    staircase=True)

    # Create an optimizer that performs gradient descent.
    opt = tf.train.GradientDescentOptimizer(lr)

    # Calculate the gradients for each model tower.
    tower_grads = []
    for i in xrange(FLAGS.num_gpus):
      with tf.device('/gpu:%d' % i):
        with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
          # Calculate the loss for one tower of the CIFAR model. This function
          # constructs the entire CIFAR model but shares the variables across
          # all towers.
          loss = tower_loss(scope)

          # Reuse variables for the next tower.
          tf.get_variable_scope().reuse_variables()

          # Retain the summaries from the final tower.
          summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)

          # Calculate the gradients for the batch of data on this CIFAR tower.
          grads = opt.compute_gradients(loss)

          # Keep track of the gradients across all towers.
          tower_grads.append(grads)

    # We must calculate the mean of each gradient. Note that this is the
    # synchronization point across all towers.
    grads = average_gradients(tower_grads)

    # Add a summary to track the learning rate.
    summaries.append(tf.summary.scalar('learning_rate', lr))

    # Add histograms for gradients.
    for grad, var in grads:
      if grad is not None:
        summaries.append(
            tf.summary.histogram(var.op.name + '/gradients', grad))

    # Apply the gradients to adjust the shared variables.
    apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)

    # Add histograms for trainable variables.
    for var in tf.trainable_variables():
      summaries.append(tf.summary.histogram(var.op.name, var))

    # Track the moving averages of all trainable variables.
    variable_averages = tf.train.ExponentialMovingAverage(
        cifar10.MOVING_AVERAGE_DECAY, global_step)
    variables_averages_op = variable_averages.apply(tf.trainable_variables())

    # Group all updates to into a single train op.
    train_op = tf.group(apply_gradient_op, variables_averages_op)

    # Create a saver.
    saver = tf.train.Saver(tf.global_variables())

    # Build the summary operation from the last tower summaries.
    summary_op = tf.summary.merge(summaries)  #tf.summary.merge_all(summaries)

    # Build an initialization operation to run below.
    init = tf.global_variables_initializer()

    # Start running operations on the Graph. allow_soft_placement must be set to
    # True to build towers on GPU, as some of the ops do not have GPU
    # implementations.
    sess = tf.Session(config=tf.ConfigProto(
        allow_soft_placement=True,
        log_device_placement=FLAGS.log_device_placement))
    sess.run(init)

    # Start the queue runners.
    tf.train.start_queue_runners(sess=sess)

    summary_writer = tf.summary.FileWriter(FLAGS.train_dir,
                                            graph=sess.graph)

    for step in xrange(FLAGS.max_steps):
      start_time = time.time()
      _, loss_value = sess.run([train_op, loss])
      duration = time.time() - start_time

      assert not np.isnan(loss_value), 'Model diverged with loss = NaN'

      if step % 10 == 0:
        num_examples_per_step = FLAGS.batch_size * FLAGS.num_gpus
        examples_per_sec = num_examples_per_step / duration
        sec_per_batch = duration / FLAGS.num_gpus

        format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
                      'sec/batch)')
        print (format_str % (datetime.now(), step, loss_value,
                             examples_per_sec, sec_per_batch))

      if step % 100 == 0:
        summary_str = sess.run(summary_op)
        summary_writer.add_summary(summary_str, step)

      # Save the model checkpoint periodically.
      if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
        checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')
        saver.save(sess, checkpoint_path, global_step=step)


def main(argv=None):  # pylint: disable=unused-argument
  cifar10.maybe_download_and_extract()
  if tf.gfile.Exists(FLAGS.train_dir):
    tf.gfile.DeleteRecursively(FLAGS.train_dir)
  tf.gfile.MakeDirs(FLAGS.train_dir)
  train()


if __name__ == '__main__':
  tf.app.run()

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64
export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda-8.0//bin:$PATH

screen python cifar10_multi_gpu_train.py --num_gpus=1
python cifar10_eval.py

Relevant Link:

https://www.tensorflow.org/api_docs/python/tf/random_crop
https://github.com/tensorflow/models/commit/e5079c839058ff40dcbd15515a9cfb462fabbc2a#diff-5ae64cf077db8f00686ff8b5d7748604
https://github.com/tensorflow/tensorflow/tree/r0.7/tensorflow/models/image/cifar10
https://github.com/tensorflow/models/pull/864/commits/e93ec37201f5f2116933ae96e505f409ddbf344d
http://qiita.com/shu223/items/ef160cbe1e9d9f57c248

5. 單詞的向量表示(Vector Representations of Words)

0x1: Word Embeddings

一般圖像或音頻系統處理的是由圖片中全部單個原始像素點強度值(pix chanel)或者音頻中功率譜密度的強度值，把它們編碼成豐富、高緯度的向量數據集(卷積)。對於物體或語音識別這一類的任務，咱們所需的所有信息已經都存儲在原始數據中(顯然人類自己就是依賴原始數據進行平常的物體或語音識別的)
而後，天然語言處理系統一般將詞彙做爲離散的單一符號，例如 "cat" 一詞或可表示爲 Id537 ，而 "dog" 一詞或可表示爲 Id143。這些符號編碼毫無規律，沒法提供不一樣詞彙之間可能存在的關聯信息。換句話說，在處理關於 "dogs" 一詞的信息時，模型將沒法利用已知的關於 "cats" 的信息(例如，它們都是動物，有四條腿，可做爲寵物等等)。可見，將詞彙表達爲上述的獨立離散符號將進一步致使數據稀疏，使咱們在訓練統計模型時不得不尋求更多的數據。而詞彙的向量表示將克服上述的難題

向量空間模型 (VSMs)將詞彙表達（嵌套）於一個連續的向量空間中，語義近似的詞彙被映射爲相鄰的數據點。向量空間模型在天然語言處理領域中有着漫長且豐富的歷史，不過幾乎全部利用這一模型的方法都依賴於分佈式假設，其核心思想爲出現於上下文情景中的詞彙都有相相似的語義。採用這一假設的研究方法大體分爲如下兩類

基於計數的方法 (e.g. 潛在語義分析): 基於計數的方法計算某詞彙與其鄰近詞彙在一個大型語料庫中共同出現的頻率及其餘統計量，而後將這些統計量映射到一個小型且稠密的向量中
預測方法 (e.g. 神經機率化語言模型): 預測方法則試圖直接從某詞彙的鄰近詞彙對其進行預測，在此過程當中利用已經學習到的小型且稠密的嵌套向量

Word2vec是一種能夠進行高效率詞嵌套學習的預測模型。其兩種變體分別爲

連續詞袋模型(CBOW): 從算法角度看，這兩種方法很是類似，其區別爲CBOW根據源詞上下文詞彙('the cat sits on the')來預測目標詞彙(例如，'mat')
Skip-Gram模型: Skip-Gram模型作法相反，它經過目標詞彙來預測源詞彙
Skip-Gram模型採起CBOW的逆過程的動機在於
    1) CBOW算法對於不少分佈式信息進行了平滑處理(例如將一整段上下文信息視爲一個單一觀察量)。不少狀況下，對於小型的數據集，這一處理是有幫助的
    2) 相形之下，Skip-Gram模型將每一個"上下文-目標詞彙"的組合視爲一個新觀察量，這種作法在大型數據集中會更爲有效

0x2: 處理噪聲對比訓練

神經機率化語言模型一般使用極大似然法 (ML) 進行訓練，其中經過 softmax function 來最大化當提供前一個單詞 h (表明 "history")，後一個單詞的機率 (表明 "target")

當 score(w_t,h) 計算了文字 w_t 和上下文 h 的相容性（一般使用向量積）。咱們使用對數似然函數來訓練訓練集的最大值，好比經過：

這裏提出了一個解決語言機率模型的合適的通用方法。然而這個方法實際執行起來開銷很是大，由於咱們須要去計算並正則化當前上下文環境 h 中全部其餘 V 單詞 w' 的機率得分，在每一步訓練迭代中

即每個單詞咱們都要進行一次預測，在全部語料組合中，最優可能緊跟着地單詞是什麼
從另外一個角度來講，當使用word2vec模型時，咱們並不須要對機率模型中的全部特徵進行學習。而CBOW模型和Skip-Gram模型爲了不這種狀況發生，使用一個二分類器（邏輯迴歸）在同一個上下文環境裏從 k 虛構的 (噪聲) 單詞區分出真正的目標單詞。咱們下面詳細闡述一下CBOW模型，對於Skip-Gram模型只要簡單地作相反的操做便可。

噪聲對比訓練的意義在於，咱們假設隨機產生的目標單詞上下文都是噪聲，它們不可能也不該該和咱們的目標單詞有語境關聯，訓練的目標就在於找到一組參數，使得儘量大的區分目標單詞的目標上下文和噪音上下文

從數學角度來講，咱們的目標是對每一個樣本最大化：

其中表明的是數據集在當前上下文 h ，根據所學習的嵌套向量，目標單詞 w 使用二分類邏輯迴歸計算得出的機率。在實踐中，咱們經過在噪聲分佈中繪製比對文字來得到近似的指望值（經過計算蒙特卡洛平均值）。
當真實地目標單詞被分配到較高的機率，同時噪聲單詞的機率很低時，目標函數也就達到最大值了。從技術層面來講，這種方法叫作"負抽樣"，並且使用這個損失函數在數學層面上也有很好的解釋：這個更新過程也近似於softmax函數的更新。這在計算上將會有很大的優點，由於當計算這個損失函數時，只是有咱們挑選出來的 k 個噪聲單詞，而沒有使用整個語料庫 V。這使得訓練變得很是快。咱們實際上使用了與noise-contrastive estimation (NCE)介紹的很是類似的方法，這在TensorFlow中已經封裝了一個很便捷的函數tf.nn.nce_loss()

0x3: Skip-gram 模型

下面來看一下這個數據集

the quick brown fox jumped over the lazy dog

咱們首先對一些單詞以及它們的上下文環境創建一個數據集。咱們能夠以任何合理的方式定義'上下文'，而一般上這個方式是根據文字的句法語境的(使用語法原理的方式處理當前目標單詞可，好比說把目標單詞左邊的內容當作一個'上下文'，或者以目標單詞右邊的內容，等等。如今咱們把目標單詞的左右單詞視做一個上下文，使用大小爲1的窗口，這樣就獲得這樣一個由(上下文, 目標單詞) 組成的數據集

([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ...

文提到Skip-Gram模型是把目標單詞和上下文顛倒過來，因此在這個問題中，舉個例子，就是用'quick'來預測 'the' 和 'brown' ，用 'brown' 預測 'quick' 和 'brown' 。所以這個數據集就變成由(輸入, 輸出)組成的：

(quick, the), (quick, brown), (brown, quick), (brown, fox), ...

目標函數一般是對整個數據集創建的，可是本問題中要對每個樣本（或者是一個batch_size 很小的樣本集，一般設置爲16 <= batch_size <= 512）在同一時間執行特別的操做，稱之爲隨機梯度降低 (SGD)。咱們來看一下訓練過程當中每一步的執行。
假設用 t 表示上面這個例子中quick 來預測 the 的訓練的單個循環。用 num_noise 定義從噪聲分佈中挑選出來的噪聲（相反的）單詞的個數，一般使用一元分佈，P(w)。爲了簡單起見，咱們就定num_noise=1，用 sheep 選做噪聲詞。接下來就能夠計算每一對觀察值和噪聲值的損失函數了，每個執行步驟就可表示爲：

整個計算過程的目標是經過更新嵌套參數來逼近目標函數(這個這個例子中就是使目標函數最大化)(即讓模型向對目標值預測機率最高，而對噪音值預測機率最低)。爲此咱們要計算損失函數中嵌套參數的梯度。對於整個數據集，當梯度降低的過程當中不斷地更新參數，對應產生的效果就是不斷地移動每一個單詞的嵌套向量，直到能夠把真實單詞和噪聲單詞很好得區分開。
咱們能夠把學習向量映射到2維中以便咱們觀察，其中用到的技術能夠參考 t-SNE 降緯技術。當咱們用可視化的方式來觀察這些向量，就能夠很明顯的獲取單詞之間語義信息的關係，這其實是很是有用的。當咱們第一次發現這樣的誘導向量空間中，展現了一些特定的語義關係，這是很是有趣的，好比文字中 male-female，gender 甚至還有 country-capital 的關係

這也解釋了爲何這些向量在傳統的NLP問題中可做爲特性使用，好比用在對一個演講章節打個標籤，或者對一個專有名詞的識別

0x4: 創建圖形

先來定義一個嵌套參數矩陣。咱們用惟一的隨機值來初始化這個大矩陣

embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))

對噪聲-比對的損失計算就使用一個邏輯迴歸模型。對此，咱們須要對語料庫中的每一個單詞定義一個權重值和誤差值。(也可稱之爲輸出權重與之對應的輸入嵌套值)。定義以下

nce_weights = tf.Variable(
  tf.truncated_normal([vocabulary_size, embedding_size],
                      stddev=1.0 / math.sqrt(embedding_size)))
nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

咱們有了這些參數以後，就能夠定義Skip-Gram模型了。簡單起見，假設咱們已經把語料庫中的文字整型化了，這樣每一個整型表明一個單詞。Skip-Gram模型有兩個輸入。一個是一組用整型表示的上下文單詞，另外一個是目標單詞。給這些輸入創建佔位符節點，以後就能夠填入數據了

# 創建輸入佔位符
train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])

而後咱們須要對批數據中的單詞創建嵌套向量

embed = tf.nn.embedding_lookup(embeddings, train_inputs)

如今咱們有了每一個單詞的嵌套向量，接下來就是使用噪聲-比對的訓練方式來預測目標單詞(找到最有多是和目標單詞對應的上下文)

# 計算 NCE 損失函數, 每次使用負標籤的樣本.
loss = tf.reduce_mean(
  tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels,
                 num_sampled, vocabulary_size))

咱們對損失函數創建了圖形節點，而後咱們須要計算相應梯度和更新參數的節點，好比說在這裏咱們會使用隨機梯度降低法，TensorFlow也已經封裝好了該過程

# 使用 SGD 控制器.
optimizer = tf.train.GradientDescentOptimizer(learning_rate=1.0).minimize(loss)

0x5: 訓練模型

訓練的過程很簡單，只要在循環中使用feed_dict不斷給佔位符填充數據，同時調用 session.run便可

for inputs, labels in generate_batch(...):
  feed_dict = {training_inputs: inputs, training_labels: labels}
  _, cur_loss = session.run([optimizer, loss], feed_dict=feed_dict)

0x6: 嵌套學習結果可視化

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import math
import os
import random
import zipfile

import numpy as np
from six.moves import urllib
from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

# Step 1: Download the data.
url = 'http://mattmahoney.net/dc/'


def maybe_download(filename, expected_bytes):
  """Download a file if not present, and make sure it's the right size."""
  if not os.path.exists(filename):
    filename, _ = urllib.request.urlretrieve(url + filename, filename)
  statinfo = os.stat(filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified', filename)
  else:
    print(statinfo.st_size)
    raise Exception(
        'Failed to verify ' + filename + '. Can you get to it with a browser?')
  return filename

filename = maybe_download('text8.zip', 31344016)


# Read the data into a list of strings.
def read_data(filename):
  """Extract the first file enclosed in a zip file as a list of words"""
  with zipfile.ZipFile(filename) as f:
    data = tf.compat.as_str(f.read(f.namelist()[0])).split()
  return data

words = read_data(filename)
print('Data size', len(words))

# Step 2: Build the dictionary and replace rare words with UNK token.
vocabulary_size = 50000


def build_dataset(words):
  count = [['UNK', -1]]
  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
  dictionary = dict()
  for word, _ in count:
    dictionary[word] = len(dictionary)
  data = list()
  unk_count = 0
  for word in words:
    if word in dictionary:
      index = dictionary[word]
    else:
      index = 0  # dictionary['UNK']
      unk_count += 1
    data.append(index)
  count[0][1] = unk_count
  reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
  return data, count, dictionary, reverse_dictionary

data, count, dictionary, reverse_dictionary = build_dataset(words)
del words  # Hint to reduce memory.
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]])

data_index = 0


# Step 3: Function to generate a training batch for the skip-gram model.
def generate_batch(batch_size, num_skips, skip_window):
  global data_index
  assert batch_size % num_skips == 0
  assert num_skips <= 2 * skip_window
  batch = np.ndarray(shape=(batch_size), dtype=np.int32)
  labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
  span = 2 * skip_window + 1  # [ skip_window target skip_window ]
  buffer = collections.deque(maxlen=span)
  for _ in range(span):
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  for i in range(batch_size // num_skips):
    target = skip_window  # target label at the center of the buffer
    targets_to_avoid = [skip_window]
    for j in range(num_skips):
      while target in targets_to_avoid:
        target = random.randint(0, span - 1)
      targets_to_avoid.append(target)
      batch[i * num_skips + j] = buffer[skip_window]
      labels[i * num_skips + j, 0] = buffer[target]
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  # Backtrack a little bit to avoid skipping words in the end of a batch
  data_index = (data_index + len(data) - span) % len(data)
  return batch, labels

batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)
for i in range(8):
  print(batch[i], reverse_dictionary[batch[i]],
        '->', labels[i, 0], reverse_dictionary[labels[i, 0]])

# Step 4: Build and train a skip-gram model.

batch_size = 128
embedding_size = 128  # Dimension of the embedding vector.
skip_window = 1       # How many words to consider left and right.
num_skips = 2         # How many times to reuse an input to generate a label.

# We pick a random validation set to sample nearest neighbors. Here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent.
valid_size = 16     # Random set of words to evaluate similarity on.
valid_window = 100  # Only pick dev samples in the head of the distribution.
valid_examples = np.random.choice(valid_window, valid_size, replace=False)
num_sampled = 64    # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default():

  # Input data.
  train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
  train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
  valid_dataset = tf.constant(valid_examples, dtype=tf.int32)

  # Ops and variables pinned to the CPU because of missing GPU implementation
  with tf.device('/cpu:0'):
    # Look up embeddings for inputs.
    embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    embed = tf.nn.embedding_lookup(embeddings, train_inputs)

    # Construct the variables for the NCE loss
    nce_weights = tf.Variable(
        tf.truncated_normal([vocabulary_size, embedding_size],
                            stddev=1.0 / math.sqrt(embedding_size)))
    nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

  # Compute the average NCE loss for the batch.
  # tf.nce_loss automatically draws a new sample of the negative labels each
  # time we evaluate the loss.
  loss = tf.reduce_mean(
      tf.nn.nce_loss(weights=nce_weights,
                     biases=nce_biases,
                     labels=train_labels,
                     inputs=embed,
                     num_sampled=num_sampled,
                     num_classes=vocabulary_size))

  # Construct the SGD optimizer using a learning rate of 1.0.
  optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)

  # Compute the cosine similarity between minibatch examples and all embeddings.
  norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
  normalized_embeddings = embeddings / norm
  valid_embeddings = tf.nn.embedding_lookup(
      normalized_embeddings, valid_dataset)
  similarity = tf.matmul(
      valid_embeddings, normalized_embeddings, transpose_b=True)

  # Add variable initializer.
  init = tf.global_variables_initializer()

# Step 5: Begin training.
num_steps = 100001

with tf.Session(graph=graph) as session:
  # We must initialize all variables before we use them.
  init.run()
  print("Initialized")

  average_loss = 0
  for step in xrange(num_steps):
    batch_inputs, batch_labels = generate_batch(
        batch_size, num_skips, skip_window)
    feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}

    # We perform one update step by evaluating the optimizer op (including it
    # in the list of returned values for session.run()
    _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
    average_loss += loss_val

    if step % 2000 == 0:
      if step > 0:
        average_loss /= 2000
      # The average loss is an estimate of the loss over the last 2000 batches.
      print("Average loss at step ", step, ": ", average_loss)
      average_loss = 0

    # Note that this is expensive (~20% slowdown if computed every 500 steps)
    if step % 10000 == 0:
      sim = similarity.eval()
      for i in xrange(valid_size):
        valid_word = reverse_dictionary[valid_examples[i]]
        top_k = 8  # number of nearest neighbors
        nearest = (-sim[i, :]).argsort()[1:top_k + 1]
        log_str = "Nearest to %s:" % valid_word
        for k in xrange(top_k):
          close_word = reverse_dictionary[nearest[k]]
          log_str = "%s %s," % (log_str, close_word)
        print(log_str)
  final_embeddings = normalized_embeddings.eval()

# Step 6: Visualize the embeddings.


def plot_with_labels(low_dim_embs, labels, filename='tsne.png'):
  assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
  plt.figure(figsize=(18, 18))  # in inches
  for i, label in enumerate(labels):
    x, y = low_dim_embs[i, :]
    plt.scatter(x, y)
    plt.annotate(label,
                 xy=(x, y),
                 xytext=(5, 2),
                 textcoords='offset points',
                 ha='right',
                 va='bottom')

  plt.savefig(filename)

try:
  from sklearn.manifold import TSNE
  import matplotlib.pyplot as plt

  tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
  plot_only = 500
  low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only, :])
  labels = [reverse_dictionary[i] for i in xrange(plot_only)]
  plot_with_labels(low_dim_embs, labels)

except ImportError:
  print("Please install sklearn, matplotlib, and scipy to visualize embeddings.")

0x7: 嵌套學習的評估: 類比推理

詞嵌套在NLP的預測問題中是很是有用且使用普遍地。若是要檢測一個模型是不是能夠成熟地區分詞性或者區分專有名詞的模型，最簡單的辦法就是直接檢驗它的預測詞性、語義關係的能力，好比讓它解決形如king is to queen as father is to ?這樣的問題。這種方法叫作類比推理

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Multi-threaded word2vec mini-batched skip-gram model.
Trains the model described in:
(Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space
ICLR 2013.
http://arxiv.org/abs/1301.3781
This model does traditional minibatching.
The key ops used are:
* placeholder for feeding in tensors for each example.
* embedding_lookup for fetching rows from the embedding matrix.
* sigmoid_cross_entropy_with_logits to calculate the loss.
* GradientDescentOptimizer for optimizing the loss.
* skipgram custom op that does input processing.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import sys
import threading
import time
import tensorflow.python.platform
from six.moves import xrange  # pylint: disable=redefined-builtin
import numpy as np
import tensorflow as tf
from tensorflow.models.embedding import gen_word2vec as word2vec
flags = tf.app.flags
flags.DEFINE_string("save_path", None, "Directory to write the model and "
                    "training summaries.")
flags.DEFINE_string("train_data", None, "Training text file. "
                    "E.g., unzipped file http://mattmahoney.net/dc/text8.zip.")
flags.DEFINE_string(
    "eval_data", None, "File consisting of analogies of four tokens."
    "embedding 2 - embedding 1 + embedding 3 should be close "
    "to embedding 4."
    "E.g. https://word2vec.googlecode.com/svn/trunk/questions-words.txt.")
flags.DEFINE_integer("embedding_size", 200, "The embedding dimension size.")
flags.DEFINE_integer(
    "epochs_to_train", 15,
    "Number of epochs to train. Each epoch processes the training data once "
    "completely.")
flags.DEFINE_float("learning_rate", 0.2, "Initial learning rate.")
flags.DEFINE_integer("num_neg_samples", 100,
                     "Negative samples per training example.")
flags.DEFINE_integer("batch_size", 16,
                     "Number of training examples processed per step "
                     "(size of a minibatch).")
flags.DEFINE_integer("concurrent_steps", 12,
                     "The number of concurrent training steps.")
flags.DEFINE_integer("window_size", 5,
                     "The number of words to predict to the left and right "
                     "of the target word.")
flags.DEFINE_integer("min_count", 5,
                     "The minimum number of word occurrences for it to be "
                     "included in the vocabulary.")
flags.DEFINE_float("subsample", 1e-3,
                   "Subsample threshold for word occurrence. Words that appear "
                   "with higher frequency will be randomly down-sampled. Set "
                   "to 0 to disable.")
flags.DEFINE_boolean(
    "interactive", False,
    "If true, enters an IPython interactive session to play with the trained "
    "model. E.g., try model.analogy('france', 'paris', 'russia') and "
    "model.nearby(['proton', 'elephant', 'maxwell']")
flags.DEFINE_integer("statistics_interval", 5,
                     "Print statistics every n seconds.")
flags.DEFINE_integer("summary_interval", 5,
                     "Save training summary to file every n seconds (rounded "
                     "up to statistics interval.")
flags.DEFINE_integer("checkpoint_interval", 600,
                     "Checkpoint the model (i.e. save the parameters) every n "
                     "seconds (rounded up to statistics interval.")
FLAGS = flags.FLAGS
class Options(object):
  """Options used by our word2vec model."""
  def __init__(self):
    # Model options.
    # Embedding dimension.
    self.emb_dim = FLAGS.embedding_size
    # Training options.
    # The training text file.
    self.train_data = FLAGS.train_data
    # Number of negative samples per example.
    self.num_samples = FLAGS.num_neg_samples
    # The initial learning rate.
    self.learning_rate = FLAGS.learning_rate
    # Number of epochs to train. After these many epochs, the learning
    # rate decays linearly to zero and the training stops.
    self.epochs_to_train = FLAGS.epochs_to_train
    # Concurrent training steps.
    self.concurrent_steps = FLAGS.concurrent_steps
    # Number of examples for one training step.
    self.batch_size = FLAGS.batch_size
    # The number of words to predict to the left and right of the target word.
    self.window_size = FLAGS.window_size
    # The minimum number of word occurrences for it to be included in the
    # vocabulary.
    self.min_count = FLAGS.min_count
    # Subsampling threshold for word occurrence.
    self.subsample = FLAGS.subsample
    # How often to print statistics.
    self.statistics_interval = FLAGS.statistics_interval
    # How often to write to the summary file (rounds up to the nearest
    # statistics_interval).
    self.summary_interval = FLAGS.summary_interval
    # How often to write checkpoints (rounds up to the nearest statistics
    # interval).
    self.checkpoint_interval = FLAGS.checkpoint_interval
    # Where to write out summaries.
    self.save_path = FLAGS.save_path
    # Eval options.
    # The text file for eval.
    self.eval_data = FLAGS.eval_data
class Word2Vec(object):
  """Word2Vec model (Skipgram)."""
  def __init__(self, options, session):
    self._options = options
    self._session = session
    self._word2id = {}
    self._id2word = []
    self.build_graph()
    self.build_eval_graph()
    self.save_vocab()
    self._read_analogies()
  def _read_analogies(self):
    """Reads through the analogy question file.
    Returns:
      questions: a [n, 4] numpy array containing the analogy question's
                 word ids.
      questions_skipped: questions skipped due to unknown words.
    """
    questions = []
    questions_skipped = 0
    with open(self._options.eval_data, "rb") as analogy_f:
      for line in analogy_f:
        if line.startswith(b":"):  # Skip comments.
          continue
        words = line.strip().lower().split(b" ")
        ids = [self._word2id.get(w.strip()) for w in words]
        if None in ids or len(ids) != 4:
          questions_skipped += 1
        else:
          questions.append(np.array(ids))
    print("Eval analogy file: ", self._options.eval_data)
    print("Questions: ", len(questions))
    print("Skipped: ", questions_skipped)
    self._analogy_questions = np.array(questions, dtype=np.int32)
  def forward(self, examples, labels):
    """Build the graph for the forward pass."""
    opts = self._options
    # Declare all variables we need.
    # Embedding: [vocab_size, emb_dim]
    init_width = 0.5 / opts.emb_dim
    emb = tf.Variable(
        tf.random_uniform(
            [opts.vocab_size, opts.emb_dim], -init_width, init_width),
        name="emb")
    self._emb = emb
    # Softmax weight: [vocab_size, emb_dim]. Transposed.
    sm_w_t = tf.Variable(
        tf.zeros([opts.vocab_size, opts.emb_dim]),
        name="sm_w_t")
    # Softmax bias: [emb_dim].
    sm_b = tf.Variable(tf.zeros([opts.vocab_size]), name="sm_b")
    # Global step: scalar, i.e., shape [].
    self.global_step = tf.Variable(0, name="global_step")
    # Nodes to compute the nce loss w/ candidate sampling.
    labels_matrix = tf.reshape(
        tf.cast(labels,
                dtype=tf.int64),
        [opts.batch_size, 1])
    # Negative sampling.
    sampled_ids, _, _ = (tf.nn.fixed_unigram_candidate_sampler(
        true_classes=labels_matrix,
        num_true=1,
        num_sampled=opts.num_samples,
        unique=True,
        range_max=opts.vocab_size,
        distortion=0.75,
        unigrams=opts.vocab_counts.tolist()))
    # Embeddings for examples: [batch_size, emb_dim]
    example_emb = tf.nn.embedding_lookup(emb, examples)
    # Weights for labels: [batch_size, emb_dim]
    true_w = tf.nn.embedding_lookup(sm_w_t, labels)
    # Biases for labels: [batch_size, 1]
    true_b = tf.nn.embedding_lookup(sm_b, labels)
    # Weights for sampled ids: [num_sampled, emb_dim]
    sampled_w = tf.nn.embedding_lookup(sm_w_t, sampled_ids)
    # Biases for sampled ids: [num_sampled, 1]
    sampled_b = tf.nn.embedding_lookup(sm_b, sampled_ids)
    # True logits: [batch_size, 1]
    true_logits = tf.reduce_sum(tf.mul(example_emb, true_w), 1) + true_b
    # Sampled logits: [batch_size, num_sampled]
    # We replicate sampled noise lables for all examples in the batch
    # using the matmul.
    sampled_b_vec = tf.reshape(sampled_b, [opts.num_samples])
    sampled_logits = tf.matmul(example_emb,
                               sampled_w,
                               transpose_b=True) + sampled_b_vec
    return true_logits, sampled_logits
  def nce_loss(self, true_logits, sampled_logits):
    """Build the graph for the NCE loss."""
    # cross-entropy(logits, labels)
    opts = self._options
    true_xent = tf.nn.sigmoid_cross_entropy_with_logits(
        true_logits, tf.ones_like(true_logits))
    sampled_xent = tf.nn.sigmoid_cross_entropy_with_logits(
        sampled_logits, tf.zeros_like(sampled_logits))
    # NCE-loss is the sum of the true and noise (sampled words)
    # contributions, averaged over the batch.
    nce_loss_tensor = (tf.reduce_sum(true_xent) +
                       tf.reduce_sum(sampled_xent)) / opts.batch_size
    return nce_loss_tensor
  def optimize(self, loss):
    """Build the graph to optimize the loss function."""
    # Optimizer nodes.
    # Linear learning rate decay.
    opts = self._options
    words_to_train = float(opts.words_per_epoch * opts.epochs_to_train)
    lr = opts.learning_rate * tf.maximum(
        0.0001, 1.0 - tf.cast(self._words, tf.float32) / words_to_train)
    self._lr = lr
    optimizer = tf.train.GradientDescentOptimizer(lr)
    train = optimizer.minimize(loss,
                               global_step=self.global_step,
                               gate_gradients=optimizer.GATE_NONE)
    self._train = train
  def build_eval_graph(self):
    """Build the eval graph."""
    # Eval graph
    # Each analogy task is to predict the 4th word (d) given three
    # words: a, b, c.  E.g., a=italy, b=rome, c=france, we should
    # predict d=paris.
    # The eval feeds three vectors of word ids for a, b, c, each of
    # which is of size N, where N is the number of analogies we want to
    # evaluate in one batch.
    analogy_a = tf.placeholder(dtype=tf.int32)  # [N]
    analogy_b = tf.placeholder(dtype=tf.int32)  # [N]
    analogy_c = tf.placeholder(dtype=tf.int32)  # [N]
    # Normalized word embeddings of shape [vocab_size, emb_dim].
    nemb = tf.nn.l2_normalize(self._emb, 1)
    # Each row of a_emb, b_emb, c_emb is a word's embedding vector.
    # They all have the shape [N, emb_dim]
    a_emb = tf.gather(nemb, analogy_a)  # a's embs
    b_emb = tf.gather(nemb, analogy_b)  # b's embs
    c_emb = tf.gather(nemb, analogy_c)  # c's embs
    # We expect that d's embedding vectors on the unit hyper-sphere is
    # near: c_emb + (b_emb - a_emb), which has the shape [N, emb_dim].
    target = c_emb + (b_emb - a_emb)
    # Compute cosine distance between each pair of target and vocab.
    # dist has shape [N, vocab_size].
    dist = tf.matmul(target, nemb, transpose_b=True)
    # For each question (row in dist), find the top 4 words.
    _, pred_idx = tf.nn.top_k(dist, 4)
    # Nodes for computing neighbors for a given word according to
    # their cosine distance.
    nearby_word = tf.placeholder(dtype=tf.int32)  # word id
    nearby_emb = tf.gather(nemb, nearby_word)
    nearby_dist = tf.matmul(nearby_emb, nemb, transpose_b=True)
    nearby_val, nearby_idx = tf.nn.top_k(nearby_dist,
                                         min(1000, self._options.vocab_size))
    # Nodes in the construct graph which are used by training and
    # evaluation to run/feed/fetch.
    self._analogy_a = analogy_a
    self._analogy_b = analogy_b
    self._analogy_c = analogy_c
    self._analogy_pred_idx = pred_idx
    self._nearby_word = nearby_word
    self._nearby_val = nearby_val
    self._nearby_idx = nearby_idx
  def build_graph(self):
    """Build the graph for the full model."""
    opts = self._options
    # The training data. A text file.
    (words, counts, words_per_epoch, self._epoch, self._words, examples,
     labels) = word2vec.skipgram(filename=opts.train_data,
                                 batch_size=opts.batch_size,
                                 window_size=opts.window_size,
                                 min_count=opts.min_count,
                                 subsample=opts.subsample)
    (opts.vocab_words, opts.vocab_counts,
     opts.words_per_epoch) = self._session.run([words, counts, words_per_epoch])
    opts.vocab_size = len(opts.vocab_words)
    print("Data file: ", opts.train_data)
    print("Vocab size: ", opts.vocab_size - 1, " + UNK")
    print("Words per epoch: ", opts.words_per_epoch)
    self._examples = examples
    self._labels = labels
    self._id2word = opts.vocab_words
    for i, w in enumerate(self._id2word):
      self._word2id[w] = i
    true_logits, sampled_logits = self.forward(examples, labels)
    loss = self.nce_loss(true_logits, sampled_logits)
    tf.scalar_summary("NCE loss", loss)
    self._loss = loss
    self.optimize(loss)
    # Properly initialize all variables.
    tf.initialize_all_variables().run()
    self.saver = tf.train.Saver()
  def save_vocab(self):
    """Save the vocabulary to a file so the model can be reloaded."""
    opts = self._options
    with open(os.path.join(opts.save_path, "vocab.txt"), "w") as f:
      for i in xrange(opts.vocab_size):
        f.write("%s %d\n" % (tf.compat.as_text(opts.vocab_words[i]),
                             opts.vocab_counts[i]))
  def _train_thread_body(self):
    initial_epoch, = self._session.run([self._epoch])
    while True:
      _, epoch = self._session.run([self._train, self._epoch])
      if epoch != initial_epoch:
        break
  def train(self):
    """Train the model."""
    opts = self._options
    initial_epoch, initial_words = self._session.run([self._epoch, self._words])
    summary_op = tf.merge_all_summaries()
    summary_writer = tf.train.SummaryWriter(opts.save_path,
                                            graph_def=self._session.graph_def)
    workers = []
    for _ in xrange(opts.concurrent_steps):
      t = threading.Thread(target=self._train_thread_body)
      t.start()
      workers.append(t)
    last_words, last_time, last_summary_time = initial_words, time.time(), 0
    last_checkpoint_time = 0
    while True:
      time.sleep(opts.statistics_interval)  # Reports our progress once a while.
      (epoch, step, loss, words, lr) = self._session.run(
          [self._epoch, self.global_step, self._loss, self._words, self._lr])
      now = time.time()
      last_words, last_time, rate = words, now, (words - last_words) / (
          now - last_time)
      print("Epoch %4d Step %8d: lr = %5.3f loss = %6.2f words/sec = %8.0f\r" %
            (epoch, step, lr, loss, rate), end="")
      sys.stdout.flush()
      if now - last_summary_time > opts.summary_interval:
        summary_str = self._session.run(summary_op)
        summary_writer.add_summary(summary_str, step)
        last_summary_time = now
      if now - last_checkpoint_time > opts.checkpoint_interval:
        self.saver.save(self._session,
                        opts.save_path + "model",
                        global_step=step.astype(int))
        last_checkpoint_time = now
      if epoch != initial_epoch:
        break
    for t in workers:
      t.join()
    return epoch
  def _predict(self, analogy):
    """Predict the top 4 answers for analogy questions."""
    idx, = self._session.run([self._analogy_pred_idx], {
        self._analogy_a: analogy[:, 0],
        self._analogy_b: analogy[:, 1],
        self._analogy_c: analogy[:, 2]
    })
    return idx
  def eval(self):
    """Evaluate analogy questions and reports accuracy."""
    # How many questions we get right at precision@1.
    correct = 0
    total = self._analogy_questions.shape[0]
    start = 0
    while start < total:
      limit = start + 2500
      sub = self._analogy_questions[start:limit, :]
      idx = self._predict(sub)
      start = limit
      for question in xrange(sub.shape[0]):
        for j in xrange(4):
          if idx[question, j] == sub[question, 3]:
            # Bingo! We predicted correctly. E.g., [italy, rome, france, paris].
            correct += 1
            break
          elif idx[question, j] in sub[question, :3]:
            # We need to skip words already in the question.
            continue
          else:
            # The correct label is not the precision@1
            break
    print()
    print("Eval %4d/%d accuracy = %4.1f%%" % (correct, total,
                                              correct * 100.0 / total))
  def analogy(self, w0, w1, w2):
    """Predict word w3 as in w0:w1 vs w2:w3."""
    wid = np.array([[self._word2id.get(w, 0) for w in [w0, w1, w2]]])
    idx = self._predict(wid)
    for c in [self._id2word[i] for i in idx[0, :]]:
      if c not in [w0, w1, w2]:
        return c
    return "unknown"
  def nearby(self, words, num=20):
    """Prints out nearby words given a list of words."""
    ids = np.array([self._word2id.get(x, 0) for x in words])
    vals, idx = self._session.run(
        [self._nearby_val, self._nearby_idx], {self._nearby_word: ids})
    for i in xrange(len(words)):
      print("\n%s\n=====================================" % (words[i]))
      for (neighbor, distance) in zip(idx[i, :num], vals[i, :num]):
        print("%-20s %6.4f" % (self._id2word[neighbor], distance))
def _start_shell(local_ns=None):
  # An interactive shell is useful for debugging/development.
  import IPython
  user_ns = {}
  if local_ns:
    user_ns.update(local_ns)
  user_ns.update(globals())
  IPython.start_ipython(argv=[], user_ns=user_ns)
def main(_):
  """Train a word2vec model."""
  if not FLAGS.train_data or not FLAGS.eval_data or not FLAGS.save_path:
    print("--train_data --eval_data and --save_path must be specified.")
    sys.exit(1)
  opts = Options()
  with tf.Graph().as_default(), tf.Session() as session:
    model = Word2Vec(opts, session)
    for _ in xrange(opts.epochs_to_train):
      model.train()  # Process one epoch
      model.eval()  # Eval analogies.
    # Perform a final save.
    model.saver.save(session,
                     os.path.join(opts.save_path, "model.ckpt"),
                     global_step=model.global_step)
    if FLAGS.interactive:
      # E.g.,
      # [0]: model.analogy('france', 'paris', 'russia')
      # [1]: model.nearby(['proton', 'elephant', 'maxwell'])
      _start_shell(locals())
if __name__ == "__main__":
  tf.app.run()

curl http://mattmahoney.net/dc/text8.zip > text8.zip
unzip text8.zip
curl https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/word2vec/source-archive.zip > source-archive.zip
unzip -p source-archive.zip  word2vec/trunk/questions-words.txt > questions-words.txt
rm text8.zip source-archive.zip

TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
g++ -std=c++11 -shared word2vec_ops.cc word2vec_kernels.cc -o word2vec_ops.so -fPIC -I $TF_INC -O2 -D_GLIBCXX_USE_CXX11_ABI=0 

python word2vec_optimized.py \
  --train_data=text8 \
  --eval_data=questions-words.txt \
  --save_path=./

Relevant Link:

http://www.cnblogs.com/rocketfan/p/4976806.html
https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/examples/tutorials/word2vec/word2vec_basic.py
http://www.aclweb.org/anthology/N1
http://msr-waypoint.com/en-us/um/people/gzweig/Pubs/NAACL2013Regularities.pdf3-1090
http://www.tensorfly.cn/tfdoc/tutorials/word2vec.html
https://github.com/tensorflow/models/tree/master/tutorials/embedding
http://www.tensorfly.cn/tfdoc/tutorials/word2vec.html

6. 循環神經網絡(RNN)、LSTM(Long-Short Term Memory, LSTM)

0x1: 語言模型

此教程將展現如何在高難度的語言模型中訓練循環神經網絡。該問題的目標是得到一個能肯定語句機率的機率模型。爲了作到這一點，經過以前已經給出的詞語來預測後面的詞語。咱們將使用 PTB(Penn Tree Bank) 數據集，這是一種經常使用來衡量模型的基準，同時它比較小並且訓練起來相對快速。

0x2: LSTM

模型的核心由一個 LSTM 單元組成，其能夠在某時刻處理一個詞語，以及計算語句可能的延續性的機率。網絡的存儲狀態由一個零矢量初始化並在讀取每個詞語後更新。並且，因爲計算上的緣由，咱們將以 batch_size 爲最小批量來處理數據。
基礎的僞代碼就像下面這樣：

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# 初始化 LSTM 存儲狀態.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
    # 每次處理一批詞語後更新狀態值.
    output, state = lstm(current_batch_of_words, state)

    # LSTM 輸出可用於產生下一個詞語的預測
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities = tf.nn.softmax(logits)
    loss += loss_function(probabilities, target_words)

0x3: 截斷反向傳播

爲使學習過程易於處理，一般的作法是將反向傳播的梯度在（按時間）展開的步驟上照一個固定長度(num_steps)截斷。經過在一次迭代中的每一個時刻上提供長度爲 num_steps 的輸入和每次迭代完成以後反向傳導，這會很容易實現。
一個簡化版的用於計算圖建立的截斷反向傳播代碼：

# 一次給定的迭代中的輸入佔位符.
words = tf.placeholder(tf.int32, [batch_size, num_steps])

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# 初始化 LSTM 存儲狀態.
initial_state = state = tf.zeros([batch_size, lstm.state_size])

for i in range(len(num_steps)):
    # 每處理一批詞語後更新狀態值.
    output, state = lstm(words[:, i], state)

    # 其他的代碼.
    # ...

final_state = state

迭代整個數據集

# 一個 numpy 數組，保存每一批詞語以後的 LSTM 狀態.
numpy_state = initial_state.eval()
total_loss = 0.0
for current_batch_of_words in words_in_dataset:
    numpy_state, current_loss = session.run([final_state, loss],
        # 經過上一次迭代結果初始化 LSTM 狀態.
        feed_dict={initial_state: numpy_state, words: current_batch_of_words})
    total_loss += current_loss

0x4: 輸入

在輸入 LSTM 前，詞語 ID 被嵌入到了一個密集的表示中(單詞矢量表示能夠在不一樣的單詞之間創建關聯性的依據)。這種方式容許模型高效地表示詞語，也便於寫代碼

# embedding_matrix 張量的形狀是： [vocabulary_size, embedding_size]
word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)

嵌入的矩陣會被隨機地初始化，模型會學會經過數據分辨不一樣詞語的意思

0x5: 損失函數

咱們想使目標詞語的平均負對數機率最小

論文中的典型衡量標準是每一個詞語的平均困惑度（perplexity），計算式爲

同時咱們會觀察訓練過程當中的困惑度值（perplexity）

0x6: 多個 LSTM 層堆疊

要想給模型更強的表達能力，能夠添加多層 LSTM 來處理數據。第一層的輸出做爲第二層的輸入，以此類推。
類 MultiRNNCell 能夠無縫的將其實現

lstm = rnn_cell.BasicLSTMCell(lstm_size)
stacked_lstm = rnn_cell.MultiRNNCell([lstm] * number_of_layers)

initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32)
for i in range(len(num_steps)):
    # 每次處理一批詞語後更新狀態值.
    output, state = stacked_lstm(words[:, i], state)

    # 其他的代碼.
    # ...

final_state = state

0x7: 在GPU上編譯並運行

wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
python ptb_word_lm.py --data_path=./simple-examples/data/ --alsologtostderr --model large

Relevant Link:

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://lib.csdn.net/article/deeplearning/59839
http://www.tensorfly.cn/tfdoc/tutorials/recurrent.html

7. 用深度學習網絡搭建一個聊天機器人

python udc_train.py --num_gpus=1
python udc_test.py --model_dir=./data 
python udc_predict.py --model_dir=./data

import os
import time
import itertools
import sys
import numpy as np
import tensorflow as tf
import udc_model
import udc_hparams
import udc_metrics
import udc_inputs
from models.dual_encoder import dual_encoder_model
from models.helpers import load_vocab

tf.flags.DEFINE_string("model_dir", None, "Directory to load model checkpoints from")
tf.flags.DEFINE_string("vocab_processor_file", "./data/vocab_processor.bin", "Saved vocabulary processor file")
FLAGS = tf.flags.FLAGS

if not FLAGS.model_dir:
  print("You must specify a model directory")
  sys.exit(1)

def tokenizer_fn(iterator):
  return (x.split(" ") for x in iterator)

# Load vocabulary
vp = tf.contrib.learn.preprocessing.VocabularyProcessor.restore(
  FLAGS.vocab_processor_file)

# Load your own data here
INPUT_CONTEXT = "how old are you!"
POTENTIAL_RESPONSES = ["fine, thanks", "twenty six yesrs old"]

def get_features(context, utterance):
  context_matrix = np.array(list(vp.transform([context])))
  utterance_matrix = np.array(list(vp.transform([utterance])))
  context_len = len(context.split(" "))
  utterance_len = len(utterance.split(" "))
  features = {
    "context": tf.convert_to_tensor(context_matrix, dtype=tf.int64),
    "context_len": tf.constant(context_len, shape=[1,1], dtype=tf.int64),
    "utterance": tf.convert_to_tensor(utterance_matrix, dtype=tf.int64),
    "utterance_len": tf.constant(utterance_len, shape=[1,1], dtype=tf.int64),
  }
  return features, None

if __name__ == "__main__":
  hparams = udc_hparams.create_hparams()
  model_fn = udc_model.create_model_fn(hparams, model_impl=dual_encoder_model)
  estimator = tf.contrib.learn.Estimator(model_fn=model_fn, model_dir=FLAGS.model_dir)

  # Ugly hack, seems to be a bug in Tensorflow
  # estimator.predict doesn't work without this line
  estimator._targets_info = tf.contrib.learn.estimators.tensor_signature.TensorSignature(tf.constant(0, shape=[1,1]))

  print("Context: {}".format(INPUT_CONTEXT))
  for r in POTENTIAL_RESPONSES:
    prob = estimator.predict(input_fn=lambda: get_features(INPUT_CONTEXT, r))
    print("{}: {:g}".format(r, prob[0,0]))

咱們能夠利用模型訓練學習獲得的模型和實際的場景進行對比，例如咱們認爲

INPUT_CONTEXT = "how old are you!"
POTENTIAL_RESPONSES = ["fine, thanks", "twenty six yesrs old"]

而後看模型是否獲得和咱們假定的同樣的結果

Relevant Link:

http://naturali.io/deeplearning/chatbot/introduction/2016/04/28/chatbot-part1.html 
http://naturali.io/deeplearning/chatbot/introduction/2016/05/16/chatbot-part2.html
https://arxiv.org/abs/1506.08909
https://github.com/dennybritz/chatbot-retrieval

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。