機器學習之TensorFlow介紹

時間 2019-11-06

標籤機器學習 tensorflow 介紹简体版

原文原文鏈接

TensorFlow的概念很簡單：使用python定義一個計算圖，而後TensorFlow根據計算圖生成高性能的c++代碼。node

如上圖所示，使用圖的方式實現了函數\(f(x,y)=x^2y+y+2\)的計算，在圖中能夠定義操做符和輸入輸出變量，基於此特性，TensorFlow可以實現分佈式的計算，能夠實現大量特徵和實例的訓練任務。python

上圖，顯示了多個GPU計算的過程，TensorFlow有一下幾個優勢：c++

支持多平臺，Windows, Linux，macOS，iOS，Android
提供了簡單的python api
有大量的其餘的基於TensorFlow的高一級的庫
可擴展性
高性能的c++實現
提供了不少方便計算代價函數的節點，帶有自動求導功能
提供了強大的可視化工具TensorBoard
提供了雲計算能力
開發社區比較活躍

下邊是，目前比較常見的深度學習開源庫：git

Creating Your First Graph and Running It in a Session

咱們用TensorFlow來寫代碼實現\(f(x,y)=x^2y+y+2\)web

import tensorflow as tf


reset_graph()

x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x * x * y + y + 2

值得注意的是，此時TensorFlow並無真正的建立這些變量，只建立了這樣一幅計算圖，要想執行這個計算，須要調用下邊的代碼，TensorFlow會自動把計算調度到cup或gpu。算法

sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()

也可使用python中的with關鍵詞，簡化代碼：shell

with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

"""
42
"""

在with代碼塊內，session被設置爲了默認的，當調用x.initializer.run()就至關於調用了tf.get_default_session().run(x.initial izer),f.eval()就至關於調用了tf.get_default_session().run(f),這麼作的目的是讓程序更加易讀。api

有時候，整個圖的參數可能會有多個，TensorFlow提供了初始化這些變量的快捷方法：數組

init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result1 = f.eval()

TensorFlow程序通常分爲兩個步驟：首先建立計算圖，其次運行。網絡

Managing Graphs

建立的任何節點默認的都會被添加到默認的圖中，咱們用代碼驗證下：

reset_graph()

x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

'''
True
'''

在通常狀況下，這基本上沒問題，可是若是須要處理多個圖，咱們更但願往不一樣的圖中添加變量，要實現這個想法，須要在建立一個新的圖後，使用with把它暫時賦值爲默認的圖。

graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)

x2.graph is graph

'''
True
'''

x2.graph is tf.get_default_graph()

'''
False
'''

Lifecycle of a Node Value

當evaluate一個節點的時候，TensorFlow自動判斷改節點的依賴節點，並先執行依賴節點。舉個例子：

w = tf.constant(3) 
x=w+2 
y=x+5 
z=x*3

with tf.Session() as sess:
  print(y.eval()) # 10 
  print(z.eval()) # 15

當執行y.eval()這行代碼的時候，TensorFlow自動判斷出它依賴x，x又依賴w，所以它首先計算w，而後再計算x，再計算y，計算z同上，可是默認的，它會計算x和w兩次。

All node values are dropped between graph runs, except variable values, which are maintained by the session across graph runs (queues and readers also maintain some state, as we will see in Chapter 12). A variable starts its life when its initializer is run, and it ends when the session is closed.

對於上邊的代碼，若是想x和w只執行一次，能夠這麼寫：

with tf.Session() as sess:
    y_val, z_val = sess.run([y, z]) 
    print(y_val) # 10 
        print(z_val) # 15

In single-process TensorFlow, multiple sessions do not share any state, even if they reuse the same graph (each session would have its own copy of every variable). In distributed TensorFlow (see Chap‐ ter 12), variable state is stored on the servers, not in the sessions, so multiple sessions can share the same variables.

Linear Regression with TensorFlow

TensorFlow operations簡稱爲ops，可以接受任何數量的輸入和任何數量的輸出，好比加法和乘法操做符，他們能夠接受2個輸入，併產生一個輸出，Constants和variables不須要輸人，它輸出一個值。若是輸入和輸出是多維數組，則成爲「tensor（張量）」。

下邊的代碼使用TensorFlow實現了線性迴歸：

import numpy as np
from sklearn.datasets import fetch_california_housing


reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')

XT = tf.transpose(X)

theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()
    
"""
array([[-3.68962631e+01],
       [ 4.36777472e-01],
       [ 9.44449380e-03],
       [-1.07348785e-01],
       [ 6.44962370e-01],
       [-3.94082872e-06],
       [-3.78797273e-03],
       [-4.20847952e-01],
       [-4.34020907e-01]], dtype=float32)
"""

咱們用最原始的數學表達式編碼以下：

X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

"""
[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]
"""

使用Scikit-Learn：

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

"""
[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]
"""

能夠看出，結果都是同樣的，TensorFlow的主要優勢是它會自動把數據的計算放到GPU卡。

Implementing Gradient Descent

實現上一小節中的線性迴歸，也可使用梯度降低算法，在這裏，咱們使用Batch Gradient Descent。在使用這個算法以前，必定要先對數據作正則化。

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

Manually Computing the Gradients

咱們首先採用手動計算梯度的方式編碼，因爲原理比較簡單，這裏只貼出代碼：

reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

"""

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.5555718
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.5396291
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473
"""

best_theta

"""
array([[ 2.0685523 ],
       [ 0.8874027 ],
       [ 0.14401656],
       [-0.34770882],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.6614529 ],
       [-0.6375279 ]], dtype=float32)
"""

能夠明顯的看出來，隨着迭代的進行，MSE逐漸收斂。

Using autodiff

使用TensorFlow的tf.gradients()能夠自動求導，代碼變得更加簡潔：

reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

gradients = tf.gradients(mse, [theta])[0]

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    
    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

"""
Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.71450037
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.5555718
Epoch 400 MSE = 0.54881126
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962916
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473
Best theta:
[[ 2.0685523 ]
 [ 0.8874027 ]
 [ 0.14401656]
 [-0.3477088 ]
 [ 0.36178365]
 [ 0.00393811]
 [-0.04269556]
 [-0.66145283]
 [-0.6375278 ]]
"""

TensorFlow採用reverse-mode autodiff,這個模式比較使用於有大量輸入和少許輸出的狀況。下圖顯示了其餘模式：

Using an Optimizer

TensorFlow還提供了更方便的optimizer功能，Gradient Descent optimizer屬於optimizer的一個特例，只須要改動不多的代碼就能實現。

只須要把上邊代碼中的

gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)

替換成下邊的代碼就能夠了

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

若是咱們想要使用其餘的optimizer，只須要修改一行代碼就能夠。

optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
                                       momentum=0.9)

Feeding Data to the Training Algorithm

要想實現Mini-batch Gradient Descent，每次須要從新設置X和y，TensorFlow提供了placeholder(),能夠理解爲，它只起到了一個佔位的做用，在真正調用的地方須要經過feed_dict這個參數，傳遞給node。

舉個簡單的例子：

reset_graph()

A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

print(B_val_1)
"""
[[6. 7. 8.]]
"""

print(B_val_2)
"""
[[ 9. 10. 11.]
 [12. 13. 14.]]
"""

實現Mini-batch Gradient Descent的代碼也比較簡單，就是不斷的獲取數據後，再訓練：

n_epochs = 1000
learning_rate = 0.01

reset_graph()

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

n_epochs = 10

batch_size = 100
n_batches = int(np.ceil(m / batch_size))

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()
    
"""
array([[ 2.0703337 ],
       [ 0.8637145 ],
       [ 0.12255152],
       [-0.31211877],
       [ 0.38510376],
       [ 0.00434168],
       [-0.0123295 ],
       [-0.83376896],
       [-0.8030471 ]], dtype=float32)
"""

Saving and Restoring Models

有時候，咱們須要把訓練好的模型保存到硬盤，或者當訓練中斷後，可以從新恢復訓練，這些狀況都須要可以提供保存和恢復的功能。TensorFlow使用Saver來實現。

建立Saver
調用saver.save()保存
調用saver。restore()恢復

保存的代碼：

saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())                                # not shown
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
    
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

恢復的代碼：

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    best_theta_restored = theta.eval() # not shown in the book

By default the saver also saves the graph structure itself in a second file with the extension .meta. You can use the function tf.train.import_meta_graph() to restore the graph structure. This function loads the graph into the default graph and returns a Saver that can then be used to restore the graph state (i.e., the variable values):

reset_graph()
# notice that we start with an empty graph.

saver = tf.train.import_meta_graph("/tmp/my_model_final.ckpt.meta")  # this loads the graph structure
theta = tf.get_default_graph().get_tensor_by_name("theta:0") # not shown in the book

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")  # this restores the graph's state
    best_theta_restored = theta.eval() # not shown in the book

Visualizing the Graph and Training Curves Using TensorBoard

TensorBoard是一個強大的基於web的工具，它的原理是：根據保存在本地的日誌數據進行繪圖，能夠顯示圖的結果和訓練效果。

實現TensorBoard須要4步：

定義須要保存日誌的文件夾

from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

在construction phase以後，寫下邊的代碼

mse_summary = tf.summary.scalar('MSE', mse)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

在須要寫入的地方寫入數據

with tf.Session() as sess:                                                        # not shown in the book
    sess.run(init)                                                                # not shown

    for epoch in range(n_epochs):                                                 # not shown
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

運行TensorBoard，鍵入下邊命令

python3 -m tensorboard.main --logdir=tf_logs

運行後的效果圖：

Name Scopes

當處理複雜模型的時候，好比神經網絡，圖中會有大量的node，就會看起來很雜亂，爲了解決這個問題，可使用TensorFlow的name scopes。

with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

Modularity

Suppose you want to create a graph that adds the output of two rectified linear units(ReLU). A ReLU computes a linear function of the inputs, and outputs the result if it is positive, and 0 otherwise,

若是是兩個relu，咱們能夠用下邊的代碼實現：

reset_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z1, 0., name="relu2")  # Oops, cut&paste error! Did you spot it?

output = tf.add(relu1, relu2, name="output")

這樣的代碼是很糟糕的，有太多的重複代碼，若是如今想擴展更多的relu，該怎麼辦？ TensorFlow提供了add_n()方法，能夠把多個值：

reset_graph()

def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

TensorFlow在建立node的時候。會爲該node建立一個惟一的name，所以，咱們最好在函數中使用name scopes，這樣圖的結構更加清晰。

def relu(X):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")    # not shown
        b = tf.Variable(0.0, name="bias")                             # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                      # not shown
        return tf.maximum(z, 0., name="max")                          # not shown

仍是以上邊的relu爲例，若是要在多個組件之中分享變量，該怎麼辦？有一下幾個可能：

能夠在relu函數中，多傳遞一個參數threshold，這樣在調用函數的時候，就能夠把該值傳遞到每一個組件中：

reset_graph()

def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                        # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

threshold = tf.Variable(0.0, name="threshold")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name="output")

上邊的方法的缺點是，在遇到須要分享的參數有不少的時候，會不太友好，relu函數就會有不少的參數。固然，爲了解決這個問題，能夠給函數傳遞一個字典或對象也能克服這個缺點。

另外一種方式是把變量保存到函數對象自己上，做爲函數的屬性進行傳遞：

reset_graph()

def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, relu.threshold, name="max")

TensorFlow提供了get_variable()函數來獲取變量，它依賴variable_scope(),變量域，

reset_graph()

with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))

上邊的代碼，建立了一個叫作relu的變量域，所以在該域下的變量threshold的name就是relu/threshold。注意，上邊的代碼中，若是threshold變量若是在該代碼調用以前就已經建立了，該代碼會拋出異常。

若是要使用已經建立的變量，須要使用下邊的代碼：

with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

或者：

with tf.variable_scope("relu") as scope:
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

Once reuse is set to True, it cannot be set back to False within the block. Moreover, if you define other variable scopes inside this one, they will automatically inherit reuse=True. Lastly, only variables created by get_variable() can be reused this way.

reset_graph()

def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")

Variables created using get_variable() are always named using the name of their variable_scope as a prefix (e.g., "relu/thres hold"), but for all other nodes (including variables created withtf.Variable()) the variable scope acts like a new name scope. In particular, if a name scope with an identical name was already cre‐ ated, then a suffix is added to make the name unique. For example, all nodes created in the preceding code (except the threshold vari‐ able) have a name prefixed with "relu_1/" to "relu_5/"

Extra material

reset_graph()

with tf.variable_scope("my_scope"):
    x0 = tf.get_variable("x", shape=(), initializer=tf.constant_initializer(0.))
    x1 = tf.Variable(0., name="x")
    x2 = tf.Variable(0., name="x")

with tf.variable_scope("my_scope", reuse=True):
    x3 = tf.get_variable("x")
    x4 = tf.Variable(0., name="x")

with tf.variable_scope("", default_name="", reuse=True):
    x5 = tf.get_variable("my_scope/x")

print("x0:", x0.op.name)
print("x1:", x1.op.name)
print("x2:", x2.op.name)
print("x3:", x3.op.name)
print("x4:", x4.op.name)
print("x5:", x5.op.name)
print(x0 is x3 and x3 is x5)

"""

x0: my_scope/x
x1: my_scope/x_1
x2: my_scope/x_2
x3: my_scope/x
x4: my_scope_1/x
x5: my_scope/x
True
"""

Exercises

What are the main benefits of creating a computation graph rather than directly executing the computations? What are the main drawbacks?

Main benefits and drawbacks of creating a computation graph rather than directly executing the computations:
- Main benefits:
  - TensorFlow can automatically compute the gradients for you (using reverse-mode autodiff).
- TensorFlow can take care of running the operations in parallel in different threads.
- It makes it easier to run the same model across different devices.
- It simplifies introspection—for example, to view the model in TensorBoard.
- Main drawbacks:
  - It makes the learning curve steeper.
- It makes step-by-step debugging harder.
Is the statement a_val = a.eval(session=sess) equivalent to a_val = sess.run(a)?

Yes, the statementa_val=a.eval(session=sess)is indeed equivalent toa_val = sess.run(a).
Is the statement a_val, b_val = a.eval(session=sess), b.eval(ses sion=sess) equivalent to a_val, b_val = sess.run([a, b])?
Can you run two graphs in the same session?
If you create a graph g containing a variable w, then start two threads and open a session in each thread, both using the same graph g, will each session have its own copy of the variable w or will it be shared?
When is a variable initialized? When is it destroyed?
What is the difference between a placeholder and a variable?
What happens when you run the graph to evaluate an operation that depends on a placeholder but you don’t feed its value? What happens if the operation does not depend on the placeholder?

If you run the graph to evaluate an operation that depends on a placeholder but you don’t feed its value, you get an exception. If the operation does not depend on the placeholder, then no exception is raised.

When you run a graph, can you feed the output value of any operation, or just the value of placeholders?

When you run a graph, you can feed the output value of any operation, not just the value of placeholders. In practice, however, this is rather rare (it can be useful, for example, when you are caching the output of frozen layers;

How can you set a variable to any value you want (during the execution phase)?

You can specify a variable’s initial value when constructing the graph, and it will be initialized later when you run the variable’s initializer during the execution phase. If you want to change that variable’s value to anything you want during the execution phase, then the simplest option is to create an assignment node (dur‐ ing the graph construction phase) using the tf.assign() function, passing the variable and a placeholder as parameters. During the execution phase, you can run the assignment operation and feed the variable’s new value using the place‐ holder.

import tensorflow as tf
x = tf.Variable(tf.random_uniform(shape=(), minval=0.0, maxval=1.0))
x_new_val = tf.placeholder(shape=(), dtype=tf.float32)
x_assign = tf.assign(x, x_new_val)
with tf.Session():
  x.initializer.run() # random number is sampled *now* 
  print(x.eval()) # 0.646157 (some random number) 
  x_assign.eval(feed_dict={x_new_val: 5.0}) 
  print(x.eval()) # 5.0

How many times does reverse-mode autodiff need to traverse the graph in order to compute the gradients of the cost function with regards to 10 variables? What about forward-mode autodiff? And symbolic differentiation?

Reverse-mode autodiff (implemented by TensorFlow) needs to traverse the graph only twice in order to compute the gradients of the cost function with regards to any number of variables. On the other hand, forward-mode autodiff would need to run once for each variable (so 10 times if we want the gradients with regards to 10 different variables). As for symbolic differentiation, it would build a different graph to compute the gradients, so it would not traverse the original graph at all (except when building the new gradients graph). A highly optimized symbolic differentiation system could potentially run the new gradients graph only once to compute the gradients with regards to all variables, but that new graph may be horribly complex and inefficient compared to the original graph.

Implement Logistic Regression with Mini-batch Gradient Descent using Tensor‐ Flow. Train it and evaluate it on the moons dataset (introduced in Chapter 5). Try adding all the bells and whistles:
- Define the graph within a logistic_regression() function that can be reused easily.
- Save checkpoints using a Saver at regular intervals during training, and save the final model at the end of training.
- Restore the last checkpoint upon startup if training was interrupted.
- Define the graph using nice scopes so the graph looks good in TensorBoard.
- Add summaries to visualize the learning curves in TensorBoard.
- Try tweaking some hyperparameters such as the learning rate or the mini- batch size and look at the shape of the learning curve.

這裏重點講一下該練習題

首先咱們先獲取數據：

from sklearn.datasets import make_moons

m = 1000
X_moons, y_moons = make_moons(m, noise=0.1, random_state=42)

看看數據的圖形：

plt.plot(X_moons[y_moons == 1, 0], X_moons[y_moons == 1, 1], 'go', label="Positive")
plt.plot(X_moons[y_moons == 0, 0], X_moons[y_moons == 0, 1], 'r^', label="Negative")
plt.legend()
plt.show()

記得要爲訓練集X加一個偏置項：

X_moons_with_bias = np.c_[np.ones((m, 1)), X_moons]

重置y的維度

y_moons_column_vector = y_moons.reshape(-1, 1)

分隔數據爲訓練集和測試集

test_ratio = 0.2
test_size = int(m * test_ratio)
X_train = X_moons_with_bias[:-test_size]
X_test = X_moons_with_bias[-test_size:]
y_train = y_moons_column_vector[:-test_size]
y_test = y_moons_column_vector[-test_size:]

寫一個生成batch的函數，該函數每次隨機使用必定數量的數據，所以數據可能會重複

def random_batch(X_train, y_train, batch_size):
    rnd_indices = np.random.randint(0, len(X_train), batch_size)
    X_batch = X_train[rnd_indices]
    y_batch = y_train[rnd_indices]
    return X_batch, y_batch

生成模型

reset_graph()
n_inputs = 2
X = tf.placeholder(tf.float32, shape=(None, n_inputs + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n_inputs + 1, 1], -1.0, 1.0, seed=42), name="theta")
logits = tf.matmul(X, theta, name="logits")
y_proba = 1 / (1 + tf.exp(-logits))

實際上TensorFlow提供了一個` `tf.sigmoid()函數

y_proba = tf.sigmoid(logits)

代價函數爲：\(J(\boldsymbol{\theta}) = -\dfrac{1}{m} \sum\limits_{i=1}^{m}{\left[ y^{(i)} \log\left(\hat{p}^{(i)}\right) + (1 - y^{(i)}) \log\left(1 - \hat{p}^{(i)}\right)\right]}\)

epsilon = 1e-7  # to avoid an overflow when computing the log
loss = -tf.reduce_mean(y * tf.log(y_proba + epsilon) + (1 - y) * tf.log(1 - y_proba + epsilon))

也可使用tf.losses.log_loss()

loss = tf.losses.log_loss(y, y_proba)  # uses epsilon = 1e-7 by default

learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

n_epochs = 1000
batch_size = 50
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = random_batch(X_train, y_train, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        loss_val = loss.eval({X: X_test, y: y_test})
        if epoch % 100 == 0:
            print("Epoch:", epoch, "\tLoss:", loss_val)

    y_proba_val = y_proba.eval(feed_dict={X: X_test, y: y_test})

輸出：

Epoch: 0    Loss: 0.792602
Epoch: 100  Loss: 0.343463
Epoch: 200  Loss: 0.30754
Epoch: 300  Loss: 0.292889
Epoch: 400  Loss: 0.285336
Epoch: 500  Loss: 0.280478
Epoch: 600  Loss: 0.278083
Epoch: 700  Loss: 0.276154
Epoch: 800  Loss: 0.27552
Epoch: 900  Loss: 0.274912

看一下模型的效果如何？

y_pred = (y_proba_val >= 0.5)

from sklearn.metrics import precision_score, recall_score

precision_score(y_test, y_pred)

"""
0.86274509803921573
"""

recall_score(y_test, y_pred)
"""
0.88888888888888884
"""

經過畫圖看看效果：

看一看出來，迴歸效果並非很好，爲了解決這個問題，咱們首先想到，給訓練數據增長新的維度，把數據映射到更高維的空間中

X_train_enhanced = np.c_[X_train,
                         np.square(X_train[:, 1]),
                         np.square(X_train[:, 2]),
                         X_train[:, 1] ** 3,
                         X_train[:, 2] ** 3]
X_test_enhanced = np.c_[X_test,
                        np.square(X_test[:, 1]),
                        np.square(X_test[:, 2]),
                        X_test[:, 1] ** 3,
                        X_test[:, 2] ** 3]

爲了方便，咱們定義一個函數

def logistic_regression(X, y, initializer=None, seed=42, learning_rate=0.01):
    n_inputs_including_bias = int(X.get_shape()[1])
    with tf.name_scope("logistic_regression"):
        with tf.name_scope("model"):
            if initializer is None:
                initializer = tf.random_uniform([n_inputs_including_bias, 1], -1.0, 1.0, seed=seed)
            theta = tf.Variable(initializer, name="theta")
            logits = tf.matmul(X, theta, name="logits")
            y_proba = tf.sigmoid(logits)
        with tf.name_scope("train"):
            loss = tf.losses.log_loss(y, y_proba, scope="loss")
            optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
            training_op = optimizer.minimize(loss)
            loss_summary = tf.summary.scalar('log_loss', loss)
        with tf.name_scope("init"):
            init = tf.global_variables_initializer()
        with tf.name_scope("save"):
            saver = tf.train.Saver()
    return y_proba, loss, training_op, loss_summary, init, saver

from datetime import datetime

def log_dir(prefix=""):
    now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
    root_logdir = "tf_logs"
    if prefix:
        prefix += "-"
    name = prefix + "run-" + now
    return "{}/{}/".format(root_logdir, name)

真正的代碼在這裏

n_inputs = 2 + 4
logdir = log_dir("logreg")

X = tf.placeholder(tf.float32, shape=(None, n_inputs + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

y_proba, loss, training_op, loss_summary, init, saver = logistic_regression(X, y)

file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

訓練

n_epochs = 10001
batch_size = 50
n_batches = int(np.ceil(m / batch_size))

checkpoint_path = "/tmp/my_logreg_model.ckpt"
checkpoint_epoch_path = checkpoint_path + ".epoch"
final_model_path = "./my_logreg_model"

with tf.Session() as sess:
    if os.path.isfile(checkpoint_epoch_path):
        # if the checkpoint file exists, restore the model and load the epoch number
        with open(checkpoint_epoch_path, "rb") as f:
            start_epoch = int(f.read())
        print("Training was interrupted. Continuing at epoch", start_epoch)
        saver.restore(sess, checkpoint_path)
    else:
        start_epoch = 0
        sess.run(init)

    for epoch in range(start_epoch, n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = random_batch(X_train_enhanced, y_train, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        loss_val, summary_str = sess.run([loss, loss_summary], feed_dict={X: X_test_enhanced, y: y_test})
        file_writer.add_summary(summary_str, epoch)
        if epoch % 500 == 0:
            print("Epoch:", epoch, "\tLoss:", loss_val)
            saver.save(sess, checkpoint_path)
            with open(checkpoint_epoch_path, "wb") as f:
                f.write(b"%d" % (epoch + 1))

    saver.save(sess, final_model_path)
    y_proba_val = y_proba.eval(feed_dict={X: X_test_enhanced, y: y_test})
    os.remove(checkpoint_epoch_path)

看一下如今的效果

y_pred = (y_proba_val >= 0.5)
precision_score(y_test, y_pred)
"""
0.97979797979797978
"""

recall_score(y_test, y_pred)
"""
0.97979797979797978
"""

效果圖

用tensorboard看一下：

上邊代碼中的參數，還能夠優化，可使用grid search和randomized search，咱們下邊演示使用randomized search方法,咱們須要優化的參數是batch_size和learning_rate

from scipy.stats import reciprocal

n_search_iterations = 10

for search_iteration in range(n_search_iterations):
    batch_size = np.random.randint(1, 100)
    learning_rate = reciprocal(0.0001, 0.1).rvs(random_state=search_iteration)

    n_inputs = 2 + 4
    logdir = log_dir("logreg")
    
    print("Iteration", search_iteration)
    print("  logdir:", logdir)
    print("  batch size:", batch_size)
    print("  learning_rate:", learning_rate)
    print("  training: ", end="")

    reset_graph()

    X = tf.placeholder(tf.float32, shape=(None, n_inputs + 1), name="X")
    y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

    y_proba, loss, training_op, loss_summary, init, saver = logistic_regression(
        X, y, learning_rate=learning_rate)

    file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

    n_epochs = 10001
    n_batches = int(np.ceil(m / batch_size))

    final_model_path = "./my_logreg_model_%d" % search_iteration

    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(n_epochs):
            for batch_index in range(n_batches):
                X_batch, y_batch = random_batch(X_train_enhanced, y_train, batch_size)
                sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            loss_val, summary_str = sess.run([loss, loss_summary], feed_dict={X: X_test_enhanced, y: y_test})
            file_writer.add_summary(summary_str, epoch)
            if epoch % 500 == 0:
                print(".", end="")

        saver.save(sess, final_model_path)

        print()
        y_proba_val = y_proba.eval(feed_dict={X: X_test_enhanced, y: y_test})
        y_pred = (y_proba_val >= 0.5)
        
        print("  precision:", precision_score(y_test, y_pred))
        print("  recall:", recall_score(y_test, y_pred))

輸出以下

Iteration 0
  logdir: tf_logs/logreg-run-20170606195328/
  batch size: 19
  learning_rate: 0.00443037524522
  training: .....................
  precision: 0.979797979798
  recall: 0.979797979798
Iteration 1
  logdir: tf_logs/logreg-run-20170606195605/
  batch size: 80
  learning_rate: 0.00178264971514
  training: .....................
  precision: 0.969696969697
  recall: 0.969696969697
Iteration 2
  logdir: tf_logs/logreg-run-20170606195646/
  batch size: 73
  learning_rate: 0.00203228544324
  training: .....................
  precision: 0.969696969697
  recall: 0.969696969697
Iteration 3
  logdir: tf_logs/logreg-run-20170606195730/
  batch size: 6
  learning_rate: 0.00449152382514
  training: .....................
  precision: 0.980198019802
  recall: 1.0
Iteration 4
  logdir: tf_logs/logreg-run-20170606200523/
  batch size: 24
  learning_rate: 0.0796323472178
  training: .....................
  precision: 0.980198019802
  recall: 1.0
Iteration 5
  logdir: tf_logs/logreg-run-20170606200726/
  batch size: 75
  learning_rate: 0.000463425058329
  training: .....................
  precision: 0.912621359223
  recall: 0.949494949495
Iteration 6
  logdir: tf_logs/logreg-run-20170606200810/
  batch size: 86
  learning_rate: 0.0477068184194
  training: .....................
  precision: 0.98
  recall: 0.989898989899
Iteration 7
  logdir: tf_logs/logreg-run-20170606200851/
  batch size: 87
  learning_rate: 0.000169404470952
  training: .....................
  precision: 0.888888888889
  recall: 0.808080808081
Iteration 8
  logdir: tf_logs/logreg-run-20170606200932/
  batch size: 61
  learning_rate: 0.0417146119941
  training: .....................
  precision: 0.980198019802
  recall: 1.0
Iteration 9
  logdir: tf_logs/logreg-run-20170606201026/
  batch size: 92
  learning_rate: 0.000107429229684
  training: .....................
  precision: 0.882352941176
  recall: 0.757575757576

很直觀的就發現了當前的最優參數，看看tensorboard

能夠看出，不一樣參數，學習曲線是不一樣的。