Welcome to the Tensorflow Tutorial! In this notebook you will learn all the basics of Tensorflow. You will implement useful functions and draw the parallel with what you did using Numpy. You will understand what Tensors and operations are, as well as how to execute them in a computation graph.python
After completing this assignment you will also be able to implement your own deep learning models using Tensorflow. In fact, using our brand new SIGNS dataset, you will build a deep neural network model to recognize numbers from 0 to 5 in sign language with a pretty impressive accuracy.git
【中文翻譯】算法
Welcome to this week's programming assignment. Until now, you've always used numpy to build neural networks. Now we will step you through a deep learning framework that will allow you to build neural networks more easily. Machine learning frameworks like TensorFlow, PaddlePaddle, Torch, Caffe, Keras, and many others can speed up your machine learning development significantly. All of these frameworks also have a lot of documentation, which you should feel free to read. In this assignment, you will learn to do the following in TensorFlow:編程
Programing frameworks can not only shorten your coding time, but sometimes also perform optimizations that speed up your code.網絡
【中文】session
To start, you will import the library:app
import math import numpy as np import h5py import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.python.framework import ops from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict %matplotlib inline np.random.seed(1)
Now that you have imported the library, we will walk you through its different applications. You will start with an example, where we compute for you the loss of one training example.框架
【中文翻譯】dom
既然您已經導入了庫, 咱們將帶您瀏覽它的不一樣應用程序。您將從一個示例開始, 在這裏咱們將爲您計算一個訓練樣本的損失。機器學習
【code】
y_hat = tf.constant(36, name='y_hat') # Define y_hat constant. Set to 36. y = tf.constant(39, name='y') # Define y. Set to 39 loss = tf.Variable((y - y_hat)**2, name='loss') # Create a variable for the loss init = tf.global_variables_initializer() # When init is run later (session.run(init)), # the loss variable will be initialized and ready to be computed with tf.Session() as session: # Create a session and print the output session.run(init) # Initializes the variables print(session.run(loss)) # Prints the loss
【result】
9
Writing and running programs in TensorFlow has the following steps:
Therefore, when we created a variable for the loss, we simply defined the loss as a function of other quantities, but did not evaluate its value. To evaluate it, we had to run init=tf.global_variables_initializer()
. That initialized the loss variable, and in the last line we were finally able to evaluate the value of loss
and print its value.[
【中文翻譯】
Now let us look at an easy example. Run the cell below:
【code】
a = tf.constant(2) b = tf.constant(10) c = tf.multiply(a,b) print(c)
【result】
Tensor("Mul:0", shape=(), dtype=int32)
As expected, you will not see 20! You got a tensor saying that the result is a tensor that does not have the shape attribute, and is of type "int32". All you did was put in the 'computation graph', but you have not run this computation yet. In order to actually multiply the two numbers, you will have to create a session and run it.
【中文翻譯】
正如所料, 你不會看到 20!你獲得一個張量說, 結果是一個不具備形狀屬性的張量, 而且是 "int32" 類型。你所作的只是把 "計算圖", 但你尚未運行這個計算。爲了將兩個數字相乘, 您必須建立一個線程並運行它。
【code】
sess = tf.Session() print(sess.run(c))
【result】
20
Great! To summarize, remember to initialize your variables, create a session and run the operations inside the session.
Next, you'll also have to know about placeholders. A placeholder is an object whose value you can specify only later. To specify values for a placeholder, you can pass in values by using a "feed dictionary" (feed_dict
variable). Below, we created a placeholder for x. This allows us to pass in a number later when we run the session.
【中文翻譯】
# Change the value of x in the feed_dict x = tf.placeholder(tf.int64, name = 'x') print(sess.run(2 * x, feed_dict = {x: 3})) sess.close()
6
When you first defined x
you did not have to specify a value for it. A placeholder is simply a variable that you will assign data to only later, when running the session. We say that you feed data to these placeholders when running the session.
Here's what's happening: When you specify the operations needed for a computation, you are telling TensorFlow how to construct a computation graph. The computation graph can have some placeholders whose values you will specify only later. Finally, when you run the session, you are telling TensorFlow to execute the computation graph.
【中文翻譯】
Lets start this programming exercise by computing the following equation: Y=WX+b, where W and X are random matrices and b is a random vector.
Exercise: Compute WX+b where W,Xand b are drawn from a random normal distribution(從隨機標準正態分佈中提取). W is of shape (4, 3), X is (3,1) and b is (4,1). As an example, here is how you would define a constant X that has shape (3,1):
X = tf.constant(np.random.randn(3,1), name = "X")
You might find the following functions helpful:
【code】
# GRADED FUNCTION: linear_function def linear_function(): """ Implements a linear function: Initializes W to be a random tensor of shape (4,3) Initializes X to be a random tensor of shape (3,1) Initializes b to be a random tensor of shape (4,1) Returns: result -- runs the session for Y = WX + b """ np.random.seed(1) ### START CODE HERE ### (4 lines of code) X = tf.constant(np.random.randn(3,1), name = "X") W = tf.constant(np.random.randn(4,3), name = "W") b = tf.constant(np.random.randn(4,1), name = "b") Y = tf.constant(np.random.randn(4,1), name = "Y") ### END CODE HERE ### # Create the session using tf.Session() and run it with sess.run(...) on the variable you want to calculate ### START CODE HERE ### sess = tf.Session() result = sess.run(tf.add(tf.matmul(W,X),b)) ### END CODE HERE ### # close the session sess.close() return result
print( "result = " + str(linear_function()))
【result】
result = [[-2.15657382] [ 2.95891446] [-1.08926781] [-0.84538042]]
Expected Output :
result | [[-2.15657382] [ 2.95891446] [-1.08926781] [-0.84538042]] |
Great! You just implemented a linear function. Tensorflow offers a variety of commonly used neural network functions like tf.sigmoid
and tf.softmax
. For this exercise lets compute the sigmoid function of an input.
You will do this exercise using a placeholder variable x
. When running the session, you should use the feed dictionary to pass in the input z
. In this exercise, you will have to
(i) create a placeholder x
,
(ii) define the operations needed to compute the sigmoid using tf.sigmoid
, and then
(iii) run the session.
Exercise : Implement the sigmoid function below. You should use the following:
tf.placeholder(tf.float32, name = "...") #若是有其餘參數,例如shape,則 tf.placeholder(dtypr=tf.float32, shape=(n_x,n_y),name = "...")
tf.sigmoid(...)
sess.run(..., feed_dict = {x: z}) #若是有多個參數,則 sess.run(..., feed_dict = {x: z,y:w, ...})
Note that there are two typical ways to create and use sessions in tensorflow:
Method 1:
sess = tf.Session() # Run the variables initialization (if needed), run the operations result = sess.run(..., feed_dict = {...}) sess.close() # Close the session
Method 2:
with tf.Session() as sess: # run the variables initialization (if needed), run the operations result = sess.run(..., feed_dict = {...}) # This takes care of closing the session for you :)
【code】
# GRADED FUNCTION: sigmoid def sigmoid(z): """ Computes the sigmoid of z Arguments: z -- input value, scalar or vector Returns: results -- the sigmoid of z """ ### START CODE HERE ### ( approx. 4 lines of code) # Create a placeholder for x. Name it 'x'. x = tf.placeholder(tf.float32, name = "x") # compute sigmoid(x) sigmoid = tf.sigmoid(x) # 1/(1 + math.e**(- x)) # Create a session, and run it. Please use the method 2 explained above. # You should use a feed_dict to pass z's value to x. with tf.Session() as sess: # Run session and call the output "result" result =sess.run( sigmoid, feed_dict = {x:z}) ### END CODE HERE ### return result
print ("sigmoid(0) = " + str(sigmoid(0))) print ("sigmoid(12) = " + str(sigmoid(12)))
【result】
sigmoid(0) = 0.5
sigmoid(12) = 0.999994
Expected Output :
sigmoid(0) | 0.5 |
sigmoid(12) | 0.999994 |
To summarize, you how know how to:
【中文翻譯】
You can also use a built-in function to compute the cost of your neural network. So instead of needing to write code to compute this as a function of a[2](i) and y(i) for i=1...m:
you can do it in one line of code in tensorflow!
Exercise: Implement the cross entropy loss. The function you will use is:
tf.nn.sigmoid_cross_entropy_with_logits(logits = ..., labels = ...)
Your code should input z
, compute the sigmoid (to get a
) and then compute the cross entropy cost JJ. All this can be done using one call to tf.nn.sigmoid_cross_entropy_with_logits
, which computes
【code】
# GRADED FUNCTION: cost def cost(logits, labels): """ Computes the cost using the sigmoid cross entropy Arguments: logits -- vector containing z, output of the last linear unit (before the final sigmoid activation) labels -- vector of labels y (1 or 0) Note: What we've been calling "z" and "y" in this class are respectively called "logits" and "labels" in the TensorFlow documentation. So logits will feed into z, and labels into y. Returns: cost -- runs the session of the cost (formula (2)) """ ### START CODE HERE ### # Create the placeholders for "logits" (z) and "labels" (y) (approx. 2 lines) z = tf.placeholder(tf.float32, name = "z") y = tf.placeholder(tf.float32, name = "y") # Use the loss function (approx. 1 line) cost =tf.nn.sigmoid_cross_entropy_with_logits(logits = z, labels = y) # Create a session (approx. 1 line). See method 1 above. sess = tf.Session() # Run the session (approx. 1 line). cost = sess.run(cost, feed_dict={z:logits,y:labels}) # Close the session (approx. 1 line). See method 1 above. sess.close() ### END CODE HERE ### return cost
【code】
logits = sigmoid(np.array([0.2,0.4,0.7,0.9])) cost = cost(logits, np.array([0,0,1,1])) print ("cost = " + str(cost))
【result】
cost = [ 1.00538719 1.03664088 0.41385433 0.39956614]
Expected Output :
cost | [ 1.00538719 1.03664088 0.41385433 0.39956614] |
Many times in deep learning you will have a y vector with numbers ranging from 0 to C-1, where C is the number of classes. If C is for example 4, then you might have the following y vector which you will need to convert as follows:
This is called a "one hot" encoding, because in the converted representation exactly one element of each column is "hot" (meaning set to 1). To do this conversion in numpy, you might have to write a few lines of code. In tensorflow, you can use one line of code:
【中文翻譯】
這稱 "1熱" 編碼, 由於在被轉換的元素中,表示法確切將地每列的一個元素設置成爲 "熱的" (意思是設置到 1)。要在 numpy 中進行此轉換, 您可能須要編寫幾行代碼。在 tensorflow 中, 您可使用一行代碼。
Exercise: Implement the function below to take one vector of labels and the total number of classes CC, and return the one hot encoding. Use tf.one_hot()
to do this.
【code】
# GRADED FUNCTION: one_hot_matrix def one_hot_matrix(labels, C): """ Creates a matrix where the i-th row corresponds to the ith class number and the jth column corresponds to the jth training example. So if example j had a label i. Then entry (i,j) will be 1. 【建立一個矩陣, 其中第i行對應於第 i 類,和 j 列 對應於 j 訓練樣本。因此, 若是 j 樣本有一個標籤 i,那麼座標(i, j)對應的值將是1】 Arguments: labels -- vector containing the labels C -- number of classes, the depth of the one hot dimension Returns: one_hot -- one hot matrix """ ### START CODE HERE ### # Create a tf.constant equal to C (depth), name it 'C'. (approx. 1 line) C = tf.constant(C, name="C") # Use tf.one_hot, be careful with the axis (approx. 1 line) one_hot_matrix = tf.one_hot(labels, C, axis=0) # Create the session (approx. 1 line) sess = tf.Session() # Run the session (approx. 1 line) one_hot = sess.run(one_hot_matrix) # Close the session (approx. 1 line). See method 1 above. sess.close() ### END CODE HERE ### return one_hot
【code】
labels = np.array([1,2,3,0,2,1]) one_hot = one_hot_matrix(labels, C = 4) print ("one_hot = " + str(one_hot))
【result】
one_hot = [[ 0. 0. 0. 1. 0. 0.] [ 1. 0. 0. 0. 0. 1.] [ 0. 1. 0. 0. 1. 0.] [ 0. 0. 1. 0. 0. 0.]]
Expected Output:
one_hot | [[ 0. 0. 0. 1. 0. 0.] [ 1. 0. 0. 0. 0. 1.] [ 0. 1. 0. 0. 1. 0.] [ 0. 0. 1. 0. 0. 0.]] |
Now you will learn how to initialize a vector of zeros and ones. The function you will be calling is tf.ones()
. To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array of dimension shape full of zeros and ones respectively.
Exercise: Implement the function below to take in a shape and to return an array (of the shape's dimension of ones).
【code】
# GRADED FUNCTION: ones def ones(shape): """ Creates an array of ones of dimension shape Arguments: shape -- shape of the array you want to create Returns: ones -- array containing only ones """ ### START CODE HERE ### # Create "ones" tensor using tf.ones(...). (approx. 1 line) ones = tf.ones(shape) # Create the session (approx. 1 line) sess = tf.Session() # Run the session to compute 'ones' (approx. 1 line) ones = sess.run(ones) # Close the session (approx. 1 line). See method 1 above. sess.close() ### END CODE HERE ### return ones
print ("ones = " + str(ones([3])))
【result】
ones = [ 1. 1. 1.]
Expected Output:
ones | [ 1. 1. 1.] |
In this part of the assignment you will build a neural network using tensorflow. Remember that there are two parts to implement a tensorflow model:
Let's delve into the problem you'd like to solve!
One afternoon, with some friends we decided to teach our computers to decipher sign language(破譯手語). We spent a few hours taking pictures in front of a white wall and came up with the following dataset. It's now your job to build an algorithm that would facilitate communications from a speech-impaired person to someone who doesn't understand sign language(如今你的工做是創建一個算法, 這將有助於一個語音受損的人與不懂手語的人之間的溝通。).
Note that this is a subset of the SIGNS dataset. The complete dataset contains many more signs.
Here are examples for each number, and how an explanation of how we represent the labels. These are the original pictures, before we lowered the image resolutoion to 64 by 64 pixels.
【中文翻譯】
一天下午, 和一些朋友一塊兒, 咱們決定教咱們的計算機破譯手語。咱們花了幾個小時在白色的牆前拍照, 並獲得了下面的數據集。如今你的工做是創建一個算法, 這將有助於一個語音受損的人與不懂手語的人之間的溝通。
請注意, 這是符號數據集的子集。完整的數據集包含許多更多的符號。
這裏是每一個數字的例子, 以及如何解釋咱們如何表明標籤。這些是原始圖片。後來, 咱們下降圖像 到 64 *64 像素。
Run the following code to load the dataset.
【code】
# Loading the dataset X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()
Change the index below and run the cell to visualize some examples in the dataset.
【code】
# Example of a picture index = 0 plt.imshow(X_train_orig[index]) print ("y = " + str(np.squeeze(Y_train_orig[:, index])))
【result】
y = 5
As usual you flatten the image dataset, then normalize it by dividing by 255. On top of that, you will convert each label to a one-hot vector as shown in Figure 1. Run the cell below to do so.
【code】
# Flatten the training and test images X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T # X_train_orig.shape = (1080, 64, 64, 3) X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T # Normalize image vectors X_train = X_train_flatten/255. X_test = X_test_flatten/255. # Convert training and test labels to one hot matrices Y_train = convert_to_one_hot(Y_train_orig, 6) Y_test = convert_to_one_hot(Y_test_orig, 6) print ("number of training examples = " + str(X_train.shape[1])) print ("number of test examples = " + str(X_test.shape[1])) print ("X_train shape: " + str(X_train.shape)) print ("Y_train shape: " + str(Y_train.shape)) print ("X_test shape: " + str(X_test.shape)) print ("Y_test shape: " + str(Y_test.shape))
【result】
number of training examples = 1080 number of test examples = 120 X_train shape: (12288, 1080) Y_train shape: (6, 1080) X_test shape: (12288, 120) Y_test shape: (6, 120)
Note that 12288 comes from 64×64×3. Each image is square, 64 by 64 pixels, and 3 is for the RGB colors. Please make sure all these shapes make sense to you before continuing.
Your goal is to build an algorithm capable of recognizing a sign with high accuracy. To do so, you are going to build a tensorflow model that is almost the same as one you have previously built in numpy for cat recognition (but now using a softmax output). It is a great occasion to compare your numpy implementation to the tensorflow one.
The model is LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX. The SIGMOID output layer has been converted to a SOFTMAX. A SOFTMAX layer generalizes SIGMOID to when there are more than two classes.
【中文翻譯】
Your first task is to create placeholders for X
and Y
. This will allow you to later pass your training data in when you run your session.
Exercise: Implement the function below to create the placeholders in tensorflow.
# GRADED FUNCTION: create_placeholders def create_placeholders(n_x, n_y): """ Creates the placeholders for the tensorflow session. Arguments: n_x -- scalar, size of an image vector (num_px * num_px = 64 * 64 * 3 = 12288) n_y -- scalar, number of classes (from 0 to 5, so -> 6) Returns: X -- placeholder for the data input, of shape [n_x, None] and dtype "float" Y -- placeholder for the input labels, of shape [n_y, None] and dtype "float" Tips: - You will use None because it let's us be flexible on the number of examples you will for the placeholders. In fact, the number of examples during test/train is different. """ ### START CODE HERE ### (approx. 2 lines) X = tf.placeholder(dtype=tf.float32,shape=(n_x, None), name = "Placeholder_1") Y = tf.placeholder(dtype=tf.float32,shape=(n_y, None), name = "Placeholder_2") ### END CODE HERE ### return X, Y
X, Y = create_placeholders(12288, 6) print ("X = " + str(X)) print ("Y = " + str(Y))
X = Tensor("Placeholder_1_1:0", shape=(12288, ?), dtype=float32) Y = Tensor("Placeholder_2_1:0", shape=(6, ?), dtype=float32)
Expected Output:
X | Tensor("Placeholder_1:0", shape=(12288, ?), dtype=float32) (not necessarily Placeholder_1) |
Y | Tensor("Placeholder_2:0", shape=(10, ?), dtype=float32) (not necessarily Placeholder_2) |
Your second task is to initialize the parameters in tensorflow.
Exercise: Implement the function below to initialize the parameters in tensorflow. You are going use Xavier Initialization for weights and Zero Initialization for biases. The shapes are given below. As an example, to help you, for W1 and b1 you could use:
W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1)) b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer())
Please use seed = 1
to make sure your results match ours.
# GRADED FUNCTION: initialize_parameters def initialize_parameters(): """ Initializes parameters to build a neural network with tensorflow. The shapes are: W1 : [25, 12288] b1 : [25, 1] W2 : [12, 25] b2 : [12, 1] W3 : [6, 12] b3 : [6, 1] Returns: parameters -- a dictionary of tensors containing W1, b1, W2, b2, W3, b3 """ tf.set_random_seed(1) # so that your "random" numbers match ours ### START CODE HERE ### (approx. 6 lines of code) W1 = tf.get_variable("W1", [25,12288], initializer = tf.contrib.layers.xavier_initializer(seed = 1)) b1 = tf.get_variable("b1", [25,1], initializer = tf.zeros_initializer()) W2 = tf.get_variable("W2", [12,25], initializer = tf.contrib.layers.xavier_initializer(seed = 1)) b2 = tf.get_variable("b2", [12,1], initializer = tf.zeros_initializer()) W3 = tf.get_variable("W3", [6,12], initializer = tf.contrib.layers.xavier_initializer(seed = 1)) b3 = tf.get_variable("b3", [6,1], initializer = tf.zeros_initializer()) ### END CODE HERE ### parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2, "W3": W3, "b3": b3} return parameters
tf.reset_default_graph() with tf.Session() as sess: parameters = initialize_parameters() print("W1 = " + str(parameters["W1"])) print("b1 = " + str(parameters["b1"])) print("W2 = " + str(parameters["W2"])) print("b2 = " + str(parameters["b2"]))
W1 = <tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref> b1 = <tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref> W2 = <tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref> b2 = <tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref>
Expected Output:
W1 | < tf.Variable 'W1:0' shape=(25, 12288) dtype=float32_ref > |
b1 | < tf.Variable 'b1:0' shape=(25, 1) dtype=float32_ref > |
W2 | < tf.Variable 'W2:0' shape=(12, 25) dtype=float32_ref > |
b2 | < tf.Variable 'b2:0' shape=(12, 1) dtype=float32_ref > |
You will now implement the forward propagation module in tensorflow. The function will take in a dictionary of parameters and it will complete the forward pass. The functions you will be using are:
tf.add(...,...)
to do an additiontf.matmul(...,...)
to do a matrix multiplicationtf.nn.relu(...)
to apply the ReLU activationQuestion: Implement the forward pass of the neural network. We commented for you the numpy equivalents so that you can compare the tensorflow implementation to numpy. It is important to note that the forward propagation stops at z3
. The reason is that in tensorflow the last linear layer output is given as input to the function computing the loss. Therefore, you don't need a3
!
tf.add(...,...)
to 作加法tf.matmul(...,...)
to 作矩陣乘法tf.nn.relu(...)
to 應用 relu 激活函數# GRADED FUNCTION: forward_propagation def forward_propagation(X, parameters): """ Implements the forward propagation for the model: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SOFTMAX Arguments: X -- input dataset placeholder, of shape (input size, number of examples) parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3" the shapes are given in initialize_parameters Returns: Z3 -- the output of the last LINEAR unit """ # Retrieve the parameters from the dictionary "parameters" W1 = parameters['W1'] b1 = parameters['b1'] W2 = parameters['W2'] b2 = parameters['b2'] W3 = parameters['W3'] b3 = parameters['b3'] ### START CODE HERE ### (approx. 5 lines) # Numpy Equivalents: Z1 = tf.add(tf.matmul(W1,X) ,b1) # Z1 = np.dot(W1, X) + b1 A1 = tf.nn.relu(Z1) # A1 = relu(Z1) Z2 = tf.add(tf.matmul(W2,A1) ,b2) # Z2 = np.dot(W2, A1) + b2 A2 = tf.nn.relu(Z2) # A2 = relu(Z2) Z3 = tf.add(tf.matmul(W3,A2) ,b3) # Z3 = np.dot(W3,A2) + b3 ### END CODE HERE ### return Z3
tf.reset_default_graph() with tf.Session() as sess: X, Y = create_placeholders(12288, 6) parameters = initialize_parameters() Z3 = forward_propagation(X, parameters) print("Z3 = " + str(Z3))
Z3 = Tensor("Add_2:0", shape=(6, ?), dtype=float32) # "Add_2:0" ???
Expected Output:
Z3 | Tensor("Add_2:0", shape=(6, ?), dtype=float32) |
As seen before, it is very easy to compute the cost using:
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = ..., labels = ...))
Question: Implement the cost function below.
logits
" and "labels
" inputs of tf.nn.softmax_cross_entropy_with_logits
are expected to be of shape (number of examples, num_classes). We have thus transposed Z3 and Y for you.tf.reduce_mean
basically does the summation over the examples.# GRADED FUNCTION: compute_cost def compute_cost(Z3, Y): """ Computes the cost Arguments: Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples) Y -- "true" labels vector placeholder, same shape as Z3 Returns: cost - Tensor of the cost function """ # to fit the tensorflow requirement for tf.nn.softmax_cross_entropy_with_logits(...,...) logits = tf.transpose(Z3) labels = tf.transpose(Y) ### START CODE HERE ### (1 line of code) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels)) ### END CODE HERE ### return cost
tf.reset_default_graph() with tf.Session() as sess: X, Y = create_placeholders(12288, 6) parameters = initialize_parameters() Z3 = forward_propagation(X, parameters) cost = compute_cost(Z3, Y) print("cost = " + str(cost))
【result】
cost = Tensor("Mean:0", shape=(), dtype=float32) #"Mean:0" ???
Expected Output:
cost | Tensor("Mean:0", shape=(), dtype=float32) |
This is where you become grateful to programming frameworks. All the backpropagation and the parameters update is taken care of in 1 line of code. It is very easy to incorporate this line in the model.
After you compute the cost function. You will create an "optimizer
" object. You have to call this object along with the cost when running the tf.session. When called, it will perform an optimization on the given cost with the chosen method and learning rate.
For instance, for gradient descent the optimizer would be:
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)
To make the optimization you would do:
_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
This computes the backpropagation by passing through the tensorflow graph in the reverse order. From cost to inputs.
Note When coding, we often use _
as a "throwaway" variable to store values that we won't need to use later. Here, _
takes on the evaluated value of optimizer
, which we don't need (and c
takes the value of the cost
variable).
【中文翻譯】
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(cost)
_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
Now, you will bring it all together!
Exercise: Implement the model. You will be calling the functions you had previously implemented.
【code】
def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001, num_epochs = 1500, minibatch_size = 32, print_cost = True): """ Implements a three-layer tensorflow neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SOFTMAX. Arguments: X_train -- training set, of shape (input size = 12288, number of training examples = 1080) Y_train -- test set, of shape (output size = 6, number of training examples = 1080) X_test -- training set, of shape (input size = 12288, number of training examples = 120) Y_test -- test set, of shape (output size = 6, number of test examples = 120) learning_rate -- learning rate of the optimization num_epochs -- number of epochs of the optimization loop minibatch_size -- size of a minibatch print_cost -- True to print the cost every 100 epochs Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables[可以在不覆蓋 tf 變量的狀況下從新運行模型] tf.set_random_seed(1) # to keep consistent results seed = 3 # to keep consistent results (n_x, m) = X_train.shape # (n_x: input size, m : number of examples in the train set) n_y = Y_train.shape[0] # n_y : output size costs = [] # To keep track of the cost # Create Placeholders of shape (n_x, n_y) ### START CODE HERE ### (1 line) X, Y = create_placeholders(n_x, n_y) ### END CODE HERE ### # Initialize parameters ### START CODE HERE ### (1 line) parameters = initialize_parameters() ### END CODE HERE ### # Forward propagation: Build the forward propagation in the tensorflow graph ### START CODE HERE ### (1 line) Z3 = forward_propagation(X, parameters) ### END CODE HERE ### # Cost function: Add cost function to tensorflow graph ### START CODE HERE ### (1 line) cost = compute_cost(Z3, Y) ### END CODE HERE ### # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer. ### START CODE HERE ### (1 line) optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost) ### END CODE HERE ### # Initialize all the variables init = tf.global_variables_initializer() # Start the session to compute the tensorflow graph with tf.Session() as sess: # Run the initialization sess.run(init) # Do the training loop for epoch in range(num_epochs): epoch_cost = 0. # Defines a cost related to an epoch num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set seed = seed + 1 minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed) for minibatch in minibatches: # Select a minibatch (minibatch_X, minibatch_Y) = minibatch # IMPORTANT: The line that runs the graph on a minibatch. # Run the session to execute the "optimizer" and the "cost", the feedict should contain a minibatch for (X,Y). ### START CODE HERE ### (1 line) _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y}) ### END CODE HERE ### epoch_cost += minibatch_cost / num_minibatches # Print the cost every epoch if print_cost == True and epoch % 100 == 0: print ("Cost after epoch %i: %f" % (epoch, epoch_cost)) if print_cost == True and epoch % 5 == 0: costs.append(epoch_cost) # plot the cost plt.plot(np.squeeze(costs)) plt.ylabel('cost') plt.xlabel('iterations (per tens)') plt.title("Learning rate =" + str(learning_rate)) plt.show() # lets save the parameters in a variable parameters = sess.run(parameters) print ("Parameters have been trained!") # Calculate the correct predictions correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y)) # Calculate accuracy on the test set accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print ("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train})) print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test})) return parameters
Run the following cell to train your model! On our machine it takes about 5 minutes. Your "Cost after epoch 100" should be 1.016458. If it's not, don't waste time; interrupt the training by clicking on the square (⬛) in the upper bar of the notebook, and try to correct your code. If it is the correct cost, take a break and come back in 5 minutes!
parameters = model(X_train, Y_train, X_test, Y_test)
【result】
Cost after epoch 0: 1.855702 Cost after epoch 100: 1.016458 Cost after epoch 200: 0.733102 Cost after epoch 300: 0.572940 Cost after epoch 400: 0.468774 Cost after epoch 500: 0.381021 Cost after epoch 600: 0.313822 Cost after epoch 700: 0.254158 Cost after epoch 800: 0.203829 Cost after epoch 900: 0.166421 Cost after epoch 1000: 0.141486 Cost after epoch 1100: 0.107580 Cost after epoch 1200: 0.086270 Cost after epoch 1300: 0.059371 Cost after epoch 1400: 0.052228
Parameters have been trained! Train Accuracy: 0.999074 Test Accuracy: 0.716667
Expected Output:
Train Accuracy | 0.999074 |
Test Accuracy | 0.716667 |
Amazing, your algorithm can recognize a sign representing a figure between 0 and 5 with 71.7% accuracy.
Insights:
【中文翻譯】
Congratulations on finishing this assignment. You can now take a picture of your hand and see the output of your model. To do that:
1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub. 2. Add your image to this Jupyter Notebook's directory, in the "images" folder 3. Write your image's name in the following code 4. Run the code and check if the algorithm is right!
import scipy from PIL import Image from scipy import ndimage ## START CODE HERE ## (PUT YOUR IMAGE NAME) my_image = "thumbs_up.jpg" ## END CODE HERE ## # We preprocess your image to fit your algorithm. fname = "images/" + my_image image = np.array(ndimage.imread(fname, flatten=False)) my_image = scipy.misc.imresize(image, size=(64,64)).reshape((1, 64*64*3)).T my_image_prediction = predict(my_image, parameters) plt.imshow(image) print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))
【result】
Your algorithm predicts: y = 3
You indeed deserved a "thumbs-up" although as you can see the algorithm seems to classify it incorrectly. The reason is that the training set doesn't contain any "thumbs-up", so the model doesn't know how to deal with it! We call that a "mismatched data distribution" and it is one of the various of the next course on "Structuring Machine Learning Projects".
What you should remember: