TensorFlow 學習初步- 預製 Estimator

時間 2019-11-21

標籤 tensorflow 學習初步預製 estimator 简体版

原文原文鏈接

程序有 premade_estimator.py 和 iris_data.pynode

iris_data.py 讀取 training data 和 test data 以及定義 estimator 用的數據格式python

------------------------------------------------------------------------------------------------------------git

1. iris_data.py 程序修改github

iris_data 遠程下載訓練集和測試集。app

http://download.tensorflow.org/data/iris_training.csv
http://download.tensorflow.org/data/iris_test.csv

可是實際測試沒法使用。dom

這裏有這兩個文件：ide

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_training.csv
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_test.csv

下載後保存爲 xlsx，並修改 iris_data.py 中文件的下載和讀取部分爲：函數

def load_data(y_name='Species'):
    
    # x-features y-labels
    
    train = pandas.read_excel('iris_training.xlsx',names = CSV_COLUMN_NAMES, header = 0)
    train_features,train_labels = train, train.pop(y_name)
    
    test = pandas.read_excel('iris_test.xlsx',names = CSV_COLUMN_NAMES, header = 0)
    test_features, test_labels = test, test.pop(y_name)
    
    return (train_features,train_labels),(test_features,test_labels)

即：原來的 may_load 部分能夠刪除。改寫 load_data，使用 read_excel。發現不一樣版本的中 returen 的測試

變量有些爲 train_features, labels 有些爲 train_x, y. 統一修改成 features 和 labels 更方便閱讀。ui

------------------------------------------------------------------------------------------------------------

2. premade_estimator.py

添加 tensorflow 和 iris_data 模塊

import tensorflow as tf
import iris_data

從 iris_data 讀取 training 和 test 數據

# Fetch the data
  (train_features, train_labels), (test_features, test_labels) = iris_data.load_data()

-------------------------------------------------------------------------------------------------------------

將 training_features data 添加到 tf.feature_column 中

my_feature_columns = []
   for key in train_features.keys():
       my_feature_columns.append(tf.feature_column.numeric_column(key=key))

其中

tf.feature_column 
#tools for ingesting and representing features

tf.feature_column.numeric_column(...)
#Represents real valued or numerical features

將 train_features 中的每個 keys 添加到 tensorflow.feature_column 中

-------------------------------------------------------------------------------------------------------------

實例化一個 estimator

classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[10, 10],
    n_classes=3)

其中

tf.estimator.DNNClassifier
# A classifier for TensorFlow DNN models.

feature_columns
# input the feature_cloumn of the model

hidden_units = [m,n]
# the length of hidden_units define the number of hidden layers
# m and n define the number of nodes in each layer

n_classes
# the classes to be clarified

-------------------------------------------------------------------------------------------------------------

訓練一個模型 Train the Model

classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_features, train_labels,args.batch_size),
    steps=args.train_steps)

train_input_fn 引用自 iris_data 定義的函數

def train_input_fn(features, labels, batch_size):

    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)

    return dataset

analysis

tf.data.Dataset 

# A Dataset can be used to represent an input pipeline as a collection of elements (nested 
# structures #of tensors) and a "logical plan" of transformations that act on those elements.

# 高層 TensorFlow API，用於讀取數據並轉化成 train 方法所需的格式

tf.data.Dataset.from_tensor_slices

# Creates a Dataset whose elements are slices of the given tensors.

dataset.shuffle

# Randomly shuffles the elements of this dataset 隨機的訓練樣本會使訓練效果更好
# 經過函數 tf.data.Dataset.shuffle 將樣本隨機化

dataset.repeat

# Repeats this dataset count times

dataset.batch

# Combines consecutive elements of this dataset into batches

(dict(features),labels) # features (dic) and labels (seris) combines as a turple

DNNClassifier.train 的第一個參數 input_fn 要求的是一個函數

A function that provides input data for training as minibatches.

並且要求這個函數的返回值是 tf.data.dataset object 或者是 turple

注意在輸入 input_fn 函數使用用的 lamda 表達式：lambda 表達式是一行函數。它們在其餘語言中也被稱爲匿名函數。若是你不想在程序中對一個函數使用兩次，你也許會想用lambda表達式，它們和普通的函數徹底同樣。

-------------------------------------------------------------------------------------------------------------

評估一個模型 Evaluate the model

爲了評估模型的有效性，每一個 estimator 都提供了 evaluate 方法

eval_result = classifier.evaluate(
    input_fn=lambda:eval_input_fn(test_features, test_labels, args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

注意評估一個模型的有效性須要調用的是測試數據集。 classifier.evaluate 的調用方法與 train 函數相似

def eval_input_fn(features, labels, batch_size):

    features=dict(features)
    if labels is None:
        # No labels, use only features.
        inputs = features
    else:
        inputs = (features, labels)

    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices(inputs)

    # Batch the examples
    assert batch_size is not None, "batch_size must not be None"
    dataset = dataset.batch(batch_size)

    # Return the dataset.
    return dataset

#The assert statement exists in almost every programming language. When you do...

assert condition
#you're telling the program to test that condition, and trigger an error if the condition is false.

#In Python, it's roughly equivalent to this:

if not condition:
    raise AssertionError()

------------------------------------------------------------------------------------------------------------

3.總結

如何構建一個 estimator

如何測試一個 estimator

若是構建 estimator 用的數據