程序有 premade_estimator.py 和 iris_data.pynode
iris_data.py 讀取 training data 和 test data 以及定義 estimator 用的數據格式python
------------------------------------------------------------------------------------------------------------git
1. iris_data.py 程序修改github
iris_data 遠程下載訓練集和測試集。app
http://download.tensorflow.org/data/iris_training.csv http://download.tensorflow.org/data/iris_test.csv
可是實際測試沒法使用。dom
這裏有這兩個文件:ide
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_training.csv https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_test.csv
下載後保存爲 xlsx,並修改 iris_data.py 中文件的下載和讀取部分爲:函數
def load_data(y_name='Species'): # x-features y-labels train = pandas.read_excel('iris_training.xlsx',names = CSV_COLUMN_NAMES, header = 0) train_features,train_labels = train, train.pop(y_name) test = pandas.read_excel('iris_test.xlsx',names = CSV_COLUMN_NAMES, header = 0) test_features, test_labels = test, test.pop(y_name) return (train_features,train_labels),(test_features,test_labels)
即:原來的 may_load 部分能夠刪除。改寫 load_data,使用 read_excel。發現不一樣版本的中 returen 的測試
變量有些爲 train_features, labels 有些爲 train_x, y. 統一修改成 features 和 labels 更方便閱讀。ui
------------------------------------------------------------------------------------------------------------
2. premade_estimator.py
添加 tensorflow 和 iris_data 模塊
import tensorflow as tf import iris_data
從 iris_data 讀取 training 和 test 數據
# Fetch the data (train_features, train_labels), (test_features, test_labels) = iris_data.load_data()
-------------------------------------------------------------------------------------------------------------
將 training_features data 添加到 tf.feature_column 中
my_feature_columns = [] for key in train_features.keys(): my_feature_columns.append(tf.feature_column.numeric_column(key=key))
其中
tf.feature_column #tools for ingesting and representing features tf.feature_column.numeric_column(...) #Represents real valued or numerical features
將 train_features 中的每個 keys 添加到 tensorflow.feature_column 中
-------------------------------------------------------------------------------------------------------------
實例化一個 estimator
classifier = tf.estimator.DNNClassifier( feature_columns=my_feature_columns, hidden_units=[10, 10], n_classes=3)
其中
tf.estimator.DNNClassifier # A classifier for TensorFlow DNN models. feature_columns # input the feature_cloumn of the model hidden_units = [m,n] # the length of hidden_units define the number of hidden layers # m and n define the number of nodes in each layer n_classes # the classes to be clarified
-------------------------------------------------------------------------------------------------------------
訓練一個模型 Train the Model
classifier.train( input_fn=lambda:iris_data.train_input_fn(train_features, train_labels,args.batch_size), steps=args.train_steps)
train_input_fn 引用自 iris_data 定義的函數
def train_input_fn(features, labels, batch_size): dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) dataset = dataset.shuffle(1000).repeat().batch(batch_size) return dataset
analysis
tf.data.Dataset # A Dataset can be used to represent an input pipeline as a collection of elements (nested # structures #of tensors) and a "logical plan" of transformations that act on those elements. # 高層 TensorFlow API,用於讀取數據並轉化成 train 方法所需的格式 tf.data.Dataset.from_tensor_slices # Creates a Dataset whose elements are slices of the given tensors. dataset.shuffle # Randomly shuffles the elements of this dataset 隨機的訓練樣本會使訓練效果更好 # 經過函數 tf.data.Dataset.shuffle 將樣本隨機化 dataset.repeat # Repeats this dataset count times dataset.batch # Combines consecutive elements of this dataset into batches (dict(features),labels) # features (dic) and labels (seris) combines as a turple
DNNClassifier.train 的第一個參數 input_fn 要求的是一個函數
A function that provides input data for training as minibatches.
並且要求這個函數的返回值是 tf.data.dataset object 或者是 turple
注意在輸入 input_fn 函數使用用的 lamda 表達式:lambda
表達式是一行函數。它們在其餘語言中也被稱爲匿名函數。若是你不想在程序中對一個函數使用兩次,你也許會想用lambda表達式,它們和普通的函數徹底同樣。
-------------------------------------------------------------------------------------------------------------
評估一個模型 Evaluate the model
爲了評估模型的有效性,每一個 estimator 都提供了 evaluate
方法
eval_result = classifier.evaluate( input_fn=lambda:eval_input_fn(test_features, test_labels, args.batch_size)) print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
注意評估一個模型的有效性須要調用的是測試數據集。 classifier.evaluate 的調用方法與 train 函數相似
def eval_input_fn(features, labels, batch_size): features=dict(features) if labels is None: # No labels, use only features. inputs = features else: inputs = (features, labels) # Convert the inputs to a Dataset. dataset = tf.data.Dataset.from_tensor_slices(inputs) # Batch the examples assert batch_size is not None, "batch_size must not be None" dataset = dataset.batch(batch_size) # Return the dataset. return dataset
#The assert statement exists in almost every programming language. When you do... assert condition #you're telling the program to test that condition, and trigger an error if the condition is false. #In Python, it's roughly equivalent to this: if not condition: raise AssertionError()
------------------------------------------------------------------------------------------------------------
3.總結
如何構建一個 estimator
如何測試一個 estimator
若是構建 estimator 用的數據