[Tensorflow] Practice - The Tensorflow Way

時間 2019-11-26

標籤 tensorflow practice way 简体版

原文原文鏈接

該系列主要是《Tensorflow 實戰Google深度學習框架》閱讀筆記；有了Cookbook的熱身後，以這本書做爲基礎造成我的知識體系。html

Ref: [Tensorflow] Cookbook - The Tensorflow Wayhtml5

第一章，簡介（略）python

第二章，安裝（僅記錄個別要點）git

Protocol buffergithub

Bazel, similar with Makefile for complile.api

Install steps: 數組

　　(1) Dockermarkdown

　　(2) Tensorflow網絡

Source code --> pip install package --> pip install.session

第三章，入門

計算圖

1. 定義計算

2. 執行計算

In [1]: import tensorflow as tf

In [2]: a = tf.constant([1.0, 2.0], name = "a")

In [3]: b = tf.constant([2.0, 3.0], name = "b")

In [4]: result = a+b

# 必須sess才能執行，這裏只是定義
In [5]: result
Out[5]: <tf.Tensor 'add:0' shape=(2,) dtype=float32>

系統默認了一個計算圖：

In [6]: print(a.graph is tf.get_default_graph())
True

In [7]: print(b.graph is tf.get_default_graph())
True

兩個圖，兩個name = 'v'的variable；但這裏不衝突。

import tensorflow as tf g1 = tf.Graph()　　#自定了一個圖 with g1.as_default():　　#設置爲當前要操做的 v = tf.get_variable("v", [1])
 g2 = tf.Graph() with g2.as_default(): v = tf.get_variable("v", [1])
 
# 定義結構圖
# 執行結構圖
 with tf.Session(graph = g1) as sess:　　# 執行圖g1 tf.global_variables_initializer().run()
 with tf.variable_scope("", reuse=True): print(sess.run(tf.get_variable("v"))) with tf.Session(graph = g2) as sess:　　# 執行圖g2 tf.global_variables_initializer().run() with tf.variable_scope("", reuse=True): print(sess.run(tf.get_variable("v")))

經過圖，指定運行圖的設備
g = tf.Graph()
with g.device('/gpu:0'):
　　result = a + b

集合

　　-- 將資源加入集合

張量

　　-- 僅保存瞭如何獲得這些數字的計算過程

import tensorflow as tf a = tf.constant([1.0, 2.0], name="a") b = tf.constant([2.0, 3.0], name="b") result = a + b print(result) sess = tf.InteractiveSession() print(result.eval()) sess.close()

獲得的是：對結果的一個引用。【一個張量的結構】

【add:0 表示result這個張量是計算節點「add"輸出的第一個結果】

【2, 表示是一維數組，長度爲2】

Tensor("add:0", shape=(2,), dtype=float32) [ 3.  5.]

基本概念：

零階張量：scalar

一階張量：vector

二階張量：matrix

三階張量：super matrix :-p

會話

將全部計算放在「with"的內部：

with tf.Session() as sess: print(sess.run(result))

NB: Graph有默認的，自動生成；但session沒有！The sess you create will be added autometically into this default Graph.

設置默認會話：【sess過程當中有一次with就能夠了】

sess = tf.Session() with sess.as_default(): print(result.eval()) Output:
[ 3.  7.]

指定爲默認會話的意義是什麼？獲取張量的取值更加方便。

sess = tf.Session() with sess.as_default():　　# 註冊的過程 print(result.eval())

經過InteractiveSession自動將會話註冊爲默認會話。

sess = tf.InteractiveSession ()　　# create session即同時註冊 print(result.eval()) sess.close()　　# 但豈不是多了一行代碼？方便在了哪裏，不解

會話配置的修改

config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)
 sess1 = tf.InteractiveSession(config=config) sess2 = tf.Session(config=config)

矩陣計算

 a = tf.matmul(x, w1)　　# 已經默認考慮了轉置問題，故比較方便

變量

【cookbook有詳細實例】

w1= tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))

NB：seed的意義在於：保證每次運行獲得的結果是同樣的。

得到shape：

w1.get_shape()
Out[51]: TensorShape([Dimension(2), Dimension(3)])

w1.get_shape()[0]
Out[52]: Dimension(2)

w1.get_shape()[1]
Out[53]: Dimension(3)

經過「拷貝」初始化

w2 = tf.Variable(w1.initialized_value())　　# 直接拷貝別人家的初始值 w3 = tf.Variable(w1.initialized_value() * 2.0)

經過"隨機數"初始化

經過"常數"初始化

變量初始化的執行

經過 tf.global_variables_initializer() 真正執行對變量初始化的設定。

w1= tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1)) w2= tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1)) x = tf.placeholder(tf.float32, shape=(1, 2), name="input")　　// 沒有初始值，但最好給出自身「容器」的大小，未來給feed瞧 a = tf.matmul(x, w1) y = tf.matmul(a, w2) sess = tf.Session() init_op = tf.global_variables_initializer() sess.run(init_op) print(sess.run(y, feed_dict={x: [[0.7,0.9]]}))

例如：w₁在Graph中的解析

Assign

變量維度的改變，但基本不用，也不會給本身找麻煩。

tf.assign( w1, w2, validate_shape=False )

第四章，深層神經網絡

激活函數讓神經網絡再也不線性化。

實現代碼，可見極其簡潔：

a = tf.nn.relu(tf.matmul(x, w1) + biases1) y = tf.nn.relu(tf.matmul(a, w2) + biases2)

Cross-entropy

避免log值太小的方式：clip_by_value

cross_entropy = -tf.reduce_mean(t * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))

Before cross-entropy, we always use softmax: X * W --> softmax --> cross-entropy

softmax_cross_entropy_with_logits( _sentinel=None, labels =None, logits =None, dim =-1, name =None ) sparse_softmax_cross_entropy_with_logits( _sentinel=None, labels =None, logits =None, name =None )

若是隻是關心前向傳播的預測值，那麼其實只關心logits部分，而後須要取出最大機率的那個label。

MSE - L2 loss

NB: Classification by xentropy; For regression, we use MSE as following:

mse = tf.reduce_mean(tf.square(y_ - y))

Loss最終的歸宿：

train_step = tf.train.AdamOptimizer(0.001).minimize(loss)

tensorflow api for LOSS:

absolute_difference(...): Adds an Absolute Difference loss to the training procedure.

add_loss(...): Adds a externally defined loss to the collection of losses.

compute_weighted_loss(...): Computes the weighted loss.

cosine_distance(...): Adds a cosine-distance loss to the training procedure.

get_losses(...): Gets the list of losses from the loss_collection.

get_regularization_loss(...): Gets the total regularization loss.

get_regularization_losses(...): Gets the list of regularization losses.

get_total_loss(...): Returns a tensor whose value represents the total loss.

hinge_loss(...): Adds a hinge loss to the training procedure.

huber_loss(...): Adds a Huber Loss term to the training procedure.

log_loss(...): Adds a Log Loss term to the training procedure.

mean_pairwise_squared_error(...): Adds a pairwise-errors-squared loss to the training procedure.

mean_squared_error(...): Adds a Sum-of-Squares loss to the training procedure.

sigmoid_cross_entropy(...): Creates a cross-entropy loss using tf.nn.sigmoid_cross_entropy_with_logits.

softmax_cross_entropy(...): Creates a cross-entropy loss using tf.nn.softmax_cross_entropy_with_logits.

sparse_softmax_cross_entropy(...): Cross-entropy loss using tf.nn.sparse_softmax_cross_entropy_with_logits.

高級點的問題

CNN--兩個Loss層計算的數值問題 (overflow...)

　　From: https://zhuanlan.zhihu.com/p/22260935

在計算Loss部分是可能出現的一些小問題以及如今的解決方法。

其實也是仔細閱讀下Caffe代碼中有關Softmax loss和sigmoid cross entropy loss兩個部分的真實計算方法。

exp這個函數實在是有毒

指數函數是一個很容易讓數值爆炸的函數，那麼輸入大概到多少會溢出呢？蛋疼的我仍是作了一個實驗：

np.exp(709) 8.2184074615549724e+307

出現以下問題：

def naive_softmax(x): y = np.exp(x) return y / np.sum(y)

#b取值很大，部分值大於了709
b = np.random.rand(10) * 1000
print b print naive_softmax(b)

[ 497.46732916  227.75385779  537.82669096  787.54950048  663.13861524
  224.69389572  958.39441314  139.09633232  381.35034548  604.08586655]
[  0.   0.   0.  nan   0.   0.  nan   0.   0.   0.]

那麼如何解決呢？咱們只要給每一個數字除以一個大數，保證它不溢出，問題不就解決了？

老司機給出的方案是找出輸入數據中最大的數，而後除以e的最大數次冪，至關於下面的代碼：

def high_level_softmax(x): max_val = np.max(x) x -= max_val return naive_softmax(x)

However，scale太大，個別值過小了！

b = np.random.rand(10) * 1000
print b print high_level_softmax(b) [ 903.27437996  260.68316085   22.31677464  544.80611744  506.26848644
  698.38019158  833.72024087  200.55675076  924.07740602  909.39841128] [ 9.23337324e-010   7.79004225e-289   0.00000000e+000   
   1.92562645e-165   3.53094986e-182   9.57072864e-099   
   5.73299537e-040   6.01134555e-315   9.99999577e-001   
   4.21690097e-007]

使用一點平滑的小技巧仍是頗有必要的，因而代碼又變成：

def practical_softmax(x): max_val = np.max(x) x -= max_val y = np.exp(x) y[y < 1e-20] = 1e-20
    return y / np.sum(y)

Result: 至關於加了個下限

[  9.23337325e-10   9.99999577e-21   9.99999577e-21   9.99999577e-21
   9.99999577e-21   9.99999577e-21   9.99999577e-21   9.99999577e-21
   9.99999577e-01   4.21690096e-07]

【但，貌似一個簡單的封裝好的 preds = tf.nn.softmax(z)，便可解決這個問題】

sigmoid也是中毒專業戶

由於其中包含了exp，*_*b

def naive_sigmoid_loss(x, t): y = 1 / (1 + np.exp(-x)) return -np.sum(t * np.log(y) + (1 - t) * np.log(1 - y)) / y.shape[0]

a = np.random.rand(10)* 1000 b = a > 500
print a print b print naive_sigmoid_loss(a,b)

[  63.20798359  958.94378279  250.75385942  895.49371345  965.62635077
   81.1217712   423.36466749  532.20604694  333.45425951  185.72621262]
[False  True False  True  True False False  True False False]
nan

改進方法：

對應代碼：

def high_level_sigmoid_loss(x, t): first = (t - (x > 0)) * x second = np.log(1 + np.exp(x - 2 * x * (x > 0))) return -np.sum(first - second) / x.shape[0]

a = np.random.rand(10)* 1000 - 500 b = a > 0 print a print b print high_level_sigmoid_loss(a,b) [-173.48716596  462.06216262 -417.78666769    6.10480948  340.13986055
   23.64615392  256.33358957 -332.46689674  416.88593348 -246.51402684] [False True False True True True True False True False] 0.000222961919658

NN的進一步優化問題

學習率的設置

沒有label，求得的值 y = x² 就直接是lost function。

對於learning_rate = 1的理解：

導數是2x，故w變化是10，這就是震盪的緣由。

import tensorflow as tf
 TRAINING_STEPS = 10 LEARNING_RATE = 1 　　#嘗試改變學習率，查看收斂效果

# x here denotes w x = tf.Variable(tf.constant(5, dtype=tf.float32), name="x") y = tf.square(x)　　# y = x² train_op = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(y) with tf.Session() as sess: sess.run(tf.global_variables_initializer())
 for i in range(TRAINING_STEPS): sess.run(train_op) x_value = sess.run(x) print "After %s iteration(s): x%s is %f."% (i+1, i+1, x_value)

指數遞減學習率

TRAINING_STEPS = 100 global_step = tf.Variable(0) LEARNING_RATE = tf.train.exponential_decay(0.1, global_step, 1, 0.96, staircase=True) # 初始學習率
# 沒1次訓練學習率衰減爲原來的0.96
 x = tf.Variable(tf.constant(5, dtype=tf.float32), name="x") y = tf.square(x) train_op = tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(y, global_step=global_step) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(TRAINING_STEPS): sess.run(train_op) if i % 10 == 0: LEARNING_RATE_value = sess.run(LEARNING_RATE) x_value = sess.run(x) print("After %s iteration(s): x%s is %f, learning rate is %f."% (i+1, i+1, x_value, LEARNING_RATE_value))

過擬合問題

畫出這兩個圖，感受很好玩的樣子，怎麼畫呢？

import tensorflow as tf import matplotlib.pyplot as plt import numpy as np data = [] label = [] np.random.seed(0)

# 以原點爲圓心，半徑爲1的圓把散點劃分紅紅藍兩部分，並加入隨機噪音。

剩下就是給data, label對兒不斷添加一對對兒數據的過程。

for i in range(150): x1 = np.random.uniform(-1,1) x2 = np.random.uniform(0,2)
 if x1**2 + x2**2 <= 1: data.append([np.random.normal(x1, 0.1),np.random.normal(x2,0.1)]) label.append(0) else: data.append([np.random.normal(x1, 0.1), np.random.normal(x2, 0.1)]) label.append(1) data = np.hstack(data ).reshape(-1,2)　　# 這裏的2對應了二維空間的x,y兩個座標值 label = np.hstack(label).reshape(-1,1)
 plt.scatter(data[:,0], data[:,1], c=label, cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white") plt.show()

np.hstack 用法

>>> a = np.array((1,2,3)) >>> b = np.array((2,3,4)) 
 >>> np.hstack((a,b)) array([1, 2, 3, 2, 3, 4]) 
 >>> a = np.array([[1],[2],[3]]) >>> b = np.array([[2],[3],[4]]) 
 >>> np.hstack((a,b)) array([[1, 2], [2, 3], [3, 4]])

np.reshape 用法

a=array([[1,2,3],[4,5,6]]) reshape(a, 6) 

 Out[202]: 
  array([1, 2, 3, 4, 5, 6])

NB：這裏的 ‘-1’

reshape(a, (3, -1)) #爲指定的值將被推斷出爲2

Out[204]:
array([[1, 2],
[3, 4],
[5, 6]])

循環生成網絡結構，好巧妙的技巧！

x  = tf.placeholder(tf.float32, shape=(None, 2)) y_ = tf.placeholder(tf.float32, shape=(None, 1)) sample_size = len(data) # 每層節點的個數：比較有意思的構建網絡方法
layer_dimension = [2,10,5,3,1] n_layers = len(layer_dimension) cur_layer = x # 循環生成網絡結構
for i in range(1, n_layers):　　# NB：這是是從2nd layer開始，也就是第一個out_layer in_dimension = layer_dimension[i-1]
 out_dimension = layer_dimension[i]
 weight = get_weight([in_dimension, out_dimension], 0.003)　　# 正則參數 ----> bias = tf.Variable(tf.constant(0.1, shape=[out_dimension]))
 cur_layer = tf.nn.elu(tf.matmul(cur_layer, weight) + bias)
  y= cur_layer # 損失函數的定義。
# 這裏只須要計算"刻畫模型在訓練數據集上的表現"的損失函數
mse_loss = tf.reduce_sum(tf.pow(y_ - y, 2)) / sample_size tf.add_to_collection('losses', mse_loss)　　# 尚未正則的loss

# 獲得了最終的損失函數 - 同時也結合了get_weight中的add_to_collection loss = tf.add_n(tf.get_collection('losses'))

tf.get_collection('losses') 的內容以下：

[<tf.Tensor 'l2_regularizer:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_1:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_2:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_3:0' shape=() dtype=float32>, <tf.Tensor 'truediv:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_4:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_5:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_6:0' shape=() dtype=float32>, <tf.Tensor 'l2_regularizer_7:0' shape=() dtype=float32>, <tf.Tensor 'truediv_1:0' shape=() dtype=float32>]

將「L2正則後的權重變量var」加入到集合中：tf.add_to_collecdtion。

def get_weight(shape, lambda1): var = tf.Variable(tf.random_normal(shape), dtype=tf.float32) tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(lambda1)(var)) return var

訓練不帶正則項的損失函數mse_loss

# 定義訓練的目標函數mse_loss，訓練次數及訓練模型
train_op = tf.train.AdamOptimizer(0.001).minimize(mse_loss) TRAINING_STEPS = 40000 with tf.Session() as sess: tf.global_variables_initializer().run() for i in range(TRAINING_STEPS): sess.run(train_op, feed_dict={x: data, y_: label}) if i % 2000 == 0: print("After %d steps, mse_loss: %f" % (i,sess.run(mse_loss, feed_dict={x: data, y_: label}))) 
 # 畫出訓練後的分割曲線 - 頗有意思！ 
 # 1. 畫網格
    xx, yy = np.mgrid[-1.2:1.2:.01, -0.2:2.2:.01] grid = np.c_[xx.ravel(), yy.ravel()]
 # 2. probs = sess.run(y, feed_dict={x:grid})　　# y在這裏表明了最後一層 probs = probs.reshape(xx.shape) plt.scatter(data[:,0], data[:,1], c=label, cmap="RdBu", vmin=-.2, vmax=1.2, edgecolor="white") plt.contour(xx, yy, probs, levels=[.5], cmap="Greys", vmin=0, vmax=.1) plt.show()

Ref: http://blog.csdn.net/u013534498/article/details/51399035

這篇博文我喜歡，數據表現也須要開專題學習。

np.mgrid用法

np.mgrid[-1.2:1.2:.01, -0.2:2.2:.01]
參數格式：行，列，間隙

Out[217]: 
array([[[-1.2 , -1.2 , -1.2 , ..., -1.2 , -1.2 , -1.2 ],
        [-1.19, -1.19, -1.19, ..., -1.19, -1.19, -1.19],
        [-1.18, -1.18, -1.18, ..., -1.18, -1.18, -1.18],
        ..., 
        [ 1.17,  1.17,  1.17, ...,  1.17,  1.17,  1.17],
        [ 1.18,  1.18,  1.18, ...,  1.18,  1.18,  1.18],
        [ 1.19,  1.19,  1.19, ...,  1.19,  1.19,  1.19]],

       [[-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],
        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],
        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],
        ..., 
        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],
        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ],
        [-0.2 , -0.19, -0.18, ...,  2.18,  2.19,  2.2 ]]])

滑動平均模型

衰減率：模型更新的速度

變量 --> 影子變量 (share init)

影子變量 = 衰減率*影子變量+(1-衰減率)*變量

衰減率越大，變量更新越快！

decay整體上不但願更新太快，但前期但願更新快些的衰減率設置辦法：

查看不一樣迭代中變量取值的變化

import tensorflow as tf v1 = tf.Variable(0, dtype=tf.float32) step = tf.Variable(0, trainable=False)
 ema = tf.train.ExponentialMovingAverage(0.99, step) # step：控制衰減率的變量 maintain_averages_op = ema.apply([v1]) # 更新列表中的變量 with tf.Session() as sess: # 初始化
    init_op = tf.global_variables_initializer() sess.run(init_op) print(sess.run([v1, ema.average(v1)]))

     [0.0, 0.0]

 # 更新變量v1的取值
    sess.run(tf.assign(v1, 5)) sess.run(maintain_averages_op) print(sess.run([v1, ema.average(v1)]))

     [5.0, 4.5]

 # 更新step和v1的取值
    sess.run(tf.assign(step, 10000)) 
 sess.run(tf.assign(v1, 10)) sess.run(maintain_averages_op) print(sess.run([v1, ema.average(v1)]))

     [10.0, 4.5549998]

 # 更新一次v1的滑動平均值
 sess.run(maintain_averages_op) print(sess.run([v1, ema.average(v1)]))

     [10.0, 4.6094499]

仍是不太瞭解其目的：難道就是爲了ema.average(v1) 這個返回結果？

疑難雜症

版本查看：

python  -c 'import tensorflow as tf; print(tf.__version__)'  # for Python 2
python3 -c 'import tensorflow as tf; print(tf.__version__)'  # for Python 3

安裝升級：

unsw@unsw-UX303UB$ pip3 install --upgrade tensorflow Requirement already up-to-date: tensorflow in /usr/local/anaconda3/lib/python3.5/site-packages Requirement already up-to-date: six>=1.10.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow) Requirement already up-to-date: tensorflow-tensorboard<0.2.0,>=0.1.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow) Requirement already up-to-date: wheel>=0.26 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow) Requirement already up-to-date: protobuf>=3.3.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow) Requirement already up-to-date: numpy>=1.11.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow) Requirement already up-to-date: werkzeug>=0.11.10 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow) Requirement already up-to-date: markdown>=2.6.8 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow) Requirement already up-to-date: bleach==1.5.0 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow) Requirement already up-to-date: html5lib==0.9999999 in /usr/local/anaconda3/lib/python3.5/site-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow) Requirement already up-to-date: setuptools in /usr/local/anaconda3/lib/python3.5/site-packages (from protobuf>=3.3.0->tensorflow)  unsw@unsw-UX303UB$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
1.3.0

忽略警告：https://github.com/tensorflow/tensorflow/issues/7778

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

[Tensorflow] Practice - The Tensorflow Way

計算圖

集合

張量

基本概念：

會話

會話配置的修改

矩陣計算

變量

變量初始化的執行

Assign

Cross-entropy

MSE - L2 loss

高級點的問題

CNN--兩個Loss層計算的數值問題 (overflow...)

NN的進一步優化問題

學習率的設置

過擬合問題

訓練不帶正則項的損失函數mse_loss

np.mgrid用法

滑動平均模型

查看不一樣迭代中變量取值的變化

疑難雜症