機器學習之TensorFlow編程環境_TensorFlow_Estimator

時間 2019-12-02

標籤機器學習 tensorflow 編程環境 estimator 简体版

原文原文鏈接

title: Machine-learning
subtitle: 1. 機器學習之TensorFlow編程環境_TensorFlow_Estimator
date: 2018-12-13 10:17:28
---node

1. 預建立的 Estimator

本文檔介紹了 TensorFlow 編程環境，並展現瞭如何在 TensorFlow 中解決鳶尾花分類問題。python

安裝TensorFlow

TensorFlow在如下64位系統上通過測試和支持：mysql

Ubuntu 16.04或更高版本

Windows 7或更高版本

macOS 10.12.6（Sierra）或更高版本（無GPU支持）

Raspbian 9.0或更高版本

官方安裝文檔

本人運行底層環境（Python2+3）

附安裝教程git

> 底層環境：
[root@CentOS1511 ~]# cat /etc/*release*
CentOS Linux release 7.2.1511 (Core) 
Derived from Red Hat Enterprise Linux 7.2 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.2.1511 (Core) 
CentOS Linux release 7.2.1511 (Core) 
> 安裝python3 + pip3
1.查看是否已經安裝Python
Centos7 默認安裝了Python2.7.5 由於一些命令要用它好比 yum 它使用的是 python2.7.5.使用 python -V 命令查看一下是否安裝 Python：
# python -V
Python 2.7.5

而後查看一下 Python 可執行文件的位置：
# which python
/usr/bin/python

進入到 bin 目錄：
# cd /usr/bin/
# ls -al python*
lrwxrwxrwx. 1 root root   34 8月  25 08:18 python -> python2
lrwxrwxrwx. 1 root root    9 8月  19 21:32 python2 -> python2.7
-rwxr-xr-x. 1 root root 7136 11月  6 2016 python2.7

能夠看到可執行文件 python 指向 python2 ，python2 又指向 python2.7 也就是說 Python 命令執行的系統預裝的 Python2.7.要安裝 Python 的其餘版本先執行mv python python.bak 命令備份 python 文件。
# mv python python.bak
# ls -al /usr/bin/python*
lrwxrwxrwx 1 root root    7 10月 15 14:35 /usr/bin/python.bak -> python2
lrwxrwxrwx 1 root root    9 10月 15 14:35 /usr/bin/python2 -> python2.7
-rwxr-xr-x 1 root root 7216 7月  13 21:07 /usr/bin/python2.7

2.修改 yum 配置文件
由於yum使用CentOS7.3默認安裝的Python 2.7.5,所以替換爲Python 3.6.2後沒法正常工做.
# yum repolist
File "/usr/bin/yum", line 30
    except KeyboardInterrupt, e:
                            ^
SyntaxError: invalid syntax

所以修改yum配置文件,繼續使用Python2.7.5.
更改yum腳本的python依賴
# cd /usr/bin
# ls -al yum*
-rwxr-xr-x. 1 root root   801 11月 15 2016 yum
-rwxr-xr-x. 1 root root  9429 11月  6 2016 yum-builddep
-rwxr-xr-x. 1 root root  8582 11月  6 2016 yum-config-manager
-rwxr-xr-x. 1 root root  7609 11月  6 2016 yum-debug-dump
-rwxr-xr-x. 1 root root  7903 11月  6 2016 yum-debug-restore
-rwxr-xr-x. 1 root root 10999 11月  6 2016 yumdownloader
-rwxr-xr-x. 1 root root 11031 11月  6 2016 yum-groups-manager
使用vim更改以上文件頭,把 #!/usr/bin/python 改成 #!/usr/bin/python2

修改gnome-tweak-tool配置文件
# vim /usr/bin/gnome-tweak-tool
把文件頭 #!/usr/bin/python 改成 #!/usr/bin/python2
  
修改urlgrabber配置文件
# vim /usr/libexec/urlgrabber-ext-down
把文件頭 #!/usr/bin/python 改成 #!/usr/bin/python2

3.編譯環境準備
安裝開發套件用於編譯 Python3.6.4 源碼
# yum groupinstall 'Development Tools'

安裝 python3.6.4 可能會使用到的依賴
# yum install -y ncurses-libs zlib-devel mysql-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

yum -y groupinstall 「Development tools」
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

4.安裝 Python3.6.4
執行wget 命令直接下載到當前文件夾：
# wget https://www.python.org/ftp/python/3.6.4/Python-3.6.4.tgz

解壓：
# mkdir /usr/local/python3/
# tar -zxvf Python-3.6.4.tgz  -C  /usr/src/

進入到Python-3.6.4文件夾下,編譯源碼
[root@CentOS1511 ~]# cd /usr/src/Python-3.6.4/
[root@CentOS1511 Python-3.6.4]# ./configure --prefix=/usr/local/python3/ --enable-optimizations
--prefix選項是配置安裝的路徑，若是不配置該選項，安裝後可執行文件默認放在/usr/local/bin，庫文件默認放在/usr/local/lib，配置文件默認放在/usr/local/etc，其它的資源文件放在/usr/local/share，比較凌亂.

若是配置--prefix，如：./configure --prefix=/usr/local/python3能夠把全部資源文件放在/usr/local/python3的路徑中，不會雜亂.用了--prefix選項的另外一個好處是卸載軟件或移植軟件。

當某個安裝的軟件再也不須要時，只須簡單的刪除該安裝目錄,就能夠把軟件卸載得乾乾淨淨；移植軟件只需拷貝整個目錄到另一個相同的操做系統機器便可.固然要卸載程序，也能夠在原來的make目錄下用一次make uninstall，但前提是make文件指定過uninstall.

--enable-optimizations 是優化選項（LTO,PGO 等）加上這個 flag 編譯後，
性能有 10% 左右的優化,可是這會明顯的增長編譯時間,老久了.

./configure命令執行完畢以後建立一個文件 Makefile, 供下面的make命令使用,執行make install以後就會把程序安裝到咱們指定的文件夾中去.

[root@CentOS1511 Python-3.6.4]# make
[root@CentOS1511 Python-3.6.4]# make install
安裝完成以後進入到/usr/local/python3目錄,能夠看到Python的可執行文件目錄bin等相關目錄.
# cd /usr/local/python3
# ll
總用量 4
drwxr-xr-x. 2 root root 4096 8月  25 08:07 bin
drwxr-xr-x. 3 root root   24 8月  25 08:07 include
drwxr-xr-x. 4 root root   63 8月  25 08:07 lib
drwxr-xr-x. 3 root root   17 8月  25 08:07 share

5.修改軟連接配置 Python3.6 、pip3.6 爲系統默認
# ln -s /usr/local/python3/bin/python3.6 /usr/bin/python
# ln -s /usr/local/python3/bin/pip3.6 /usr/bin/pip
ln: 沒法建立符號連接"/usr/bin/pip": 文件已存在
[root@CentOS1511 bin]# cd /usr/bin/
[root@CentOS1511 bin]# ls -al pip*
[root@CentOS1511 bin]# mv pip pip.bak
查看系統默認 Python、pip版本:
> 運行環境（python3 + pip3）：
[root@CentOS1511 ~]# python2 -V
Python 2.7.5
[root@CentOS1511 ~]# python -V
Python 3.6.4
[root@CentOS1511 ~]# pip -V
pip 18.1 from /usr/local/python3/lib/python3.6/site-packages/pip (python 3.6)
[root@CentOS1511 ~]# ls^Cusr/local/python3/

[root@CentOS1511 ~]# ls -al /usr/bin/python*
lrwxrwxrwx 1 root root   32 10月 21 15:42 /usr/bin/python -> /usr/local/python3/bin/python3.6
lrwxrwxrwx 1 root root    9 10月 15 14:35 /usr/bin/python2 -> python2.7
-rwxr-xr-x 1 root root 7216 7月  13 21:07 /usr/bin/python2.7
lrwxrwxrwx 1 root root    7 10月 15 14:35 /usr/bin/python.bak -> python2

[root@CentOS1511 ~]# ls -al /usr/bin/pip*
lrwxrwxrwx 1 root root  29 10月 21 15:42 /usr/bin/pip -> /usr/local/python3/bin/pip3.6
-rwxr-xr-x 1 root root 215 10月 15 19:36 /usr/bin/pip2
-rwxr-xr-x 1 root root 215 10月 15 19:36 /usr/bin/pip2.7
-rwxr-xr-x 1 root root 215 10月 15 19:36 /usr/bin/pip.bak
[root@CentOS1511 ~]# 

> 可忽略
[root@CentOS1511 Tensorflow]# vim ~/.bash_profile 
[root@CentOS1511 ~]# cat  ~/.bash_profile           
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin:/usr/local/python3/bin

export PATH
[root@CentOS1511 Tensorflow]# source ~/.bash_profile

前提條件

若是您是使用 virtualenv 或 Anaconda 安裝的 TensorFlow，請激活 TensorFlow 環境。
安裝Tensorflowgithub

[root@CentOS1511 ~]# mkdir /Tensorflow
[root@CentOS1511 ~]# cd /Tensorflow/
> 安裝Tensorflow：
[root@CentOS1511 Tensorflow]# pip install tensorflow
> 安裝或升級 Pandas：
[root@CentOS1511 ~]# pip install pandas

獲取示例代碼

按照下列步驟獲取咱們將要使用的示例代碼：算法

經過輸入如下命令從 GitHub 克隆 TensorFlow 模型代碼庫：sql

[root@CentOS1511 Tensorflow]# git clone https://github.com/tensorflow/models

將此分支內的目錄更改成包含本文檔中所用示例的位置：編程

[root@CentOS1511 Tensorflow]# cd /Tensorflow/models/samples/core/get_started/

運行程序

[root@CentOS1511 get_started]# python premade_estimator.py

該程序應該會輸出訓練日誌，而後對測試集進行一些預測。例如，如下輸出的第一行顯示該模型認爲測試集中的第一個樣本是山鳶尾的可能性爲 99.6％。因爲測試集中的第一個樣本確實是山鳶尾，所以該模型預測得還比較準確。vim

···
Prediction is "Setosa" (99.9%), expected "Setosa"

Prediction is "Versicolor" (98.6%), expected "Versicolor"

Prediction is "Virginica" (98.9%), expected "Virginica"

若是程序生成錯誤（而不是答案），請思考下列問題：centos

是否正確安裝了 TensorFlow？
使用的 TensorFlow 版本是否正確？
是否激活了 TensorFlow 所在的安裝環境？（此問題僅與某些安裝機制有關。）

2. 編程堆棧

在詳細瞭解程序自己以前，先來了解編程環境。以下圖所示，TensorFlow 提供一個包含多個 API 層的編程堆棧：

強烈建議使用下列 API 編寫 TensorFlow 程序：

Estimator：表明一個完整的模型。Estimator API 提供一些方法來訓練模型、判斷模型的準確率並生成預測。
Estimator的數據集：構建數據輸入管道。Dataset API 提供一些方法來加載和操做數據，並將數據饋送到模型中。Dataset API 與 Estimator API 合做無間。

3. 對鳶尾花進行分類：概覽

示例程序構建並測試了一個模型，此模型根據鳶尾花的花萼和花瓣大小將其分爲三種不一樣的品種。

三個鳶尾花品種的花瓣幾何對比：山鳶尾、維吉尼亞鳶尾和變色鳶尾

從左到右：山鳶尾（提供者：Radomil，依據 CC BY-SA 3.0 使用）、變色鳶尾（提供者：Dlanglois，依據 CC BY-SA 3.0 使用）和維吉尼亞鳶尾（提供者：Frank Mayfield，依據 CC BY-SA 2.0 使用）。

數據集

鳶尾花數據集包含四個特徵和一個標籤。這四個特徵肯定了單株鳶尾花的下列植物學特徵：

花萼長度
花萼寬度
花瓣長度
花瓣寬度
模型會將這些特徵表示爲 float32 數值數據。

該標籤肯定了鳶尾花品種，品種必須是下列任意一種：

山鳶尾 (0)
變色鳶尾 (1)
維吉尼亞鳶尾 (2)
模型會將該標籤表示爲 int32 分類數據。

下表顯示了數據集中的三個樣本：

花萼長度	花萼寬度	花瓣長度	花瓣寬度	品種（標籤）
5.1	3.3	1.7	0.5	0（山鳶尾）
5.0	2.3	3.3	1.0	1（變色鳶尾）
6.4	2.8	5.6	2.2	2（維吉尼亞鳶尾）

算法

該程序會訓練一個具備如下拓撲結構的深度神經網絡分類器模型：

2 個隱藏層。
每一個隱藏層包含 10 個節點。
下圖展現了特徵、隱藏層和預測（並未顯示隱藏層中的全部節點）：

網絡架構圖：輸入、2 個隱藏層和輸出

推理

在無標籤樣本上運行通過訓練的模型會產生三個預測，即相應鳶尾花屬於指定品種的可能性。這些輸出預測的總和是 1.0。例如，對無標籤樣本的預測可能以下所示：

0.03（山鳶尾）
0.95（變色鳶尾）
0.02（維吉尼亞鳶尾）
上面的預測表示指定無標籤樣本是變色鳶尾的機率爲 95％。

4. 採用 Estimator 進行編程的概覽

Estimator 是 TensorFlow 對完整模型的高級表示。它會處理初始化、日誌記錄、保存和恢復等細節部分，並具備不少其餘功能，以即可以專一於模型。有關更多詳情，請參閱 Estimator。

Estimator 是從 tf.estimator.Estimator 衍生而來的任何類。TensorFlow 提供一組預建立的 Estimator（例如 LinearRegressor）來實現常見的機器學習算法。除此以外，能夠編寫自定義 Estimator。咱們建議在剛開始使用 TensorFlow 時使用預建立的 Estimator。

要根據預建立的 Estimator 編寫 TensorFlow 程序，必須執行下列任務：

建立一個或多個輸入函數。
定義模型的特徵列。
實例化 Estimator，指定特徵列和各類超參數。
在 Estimator 對象上調用一個或多個方法，傳遞適當的輸入函數做爲數據的來源。
來看看如何針對鳶尾花分類實施這些任務。

5. 建立輸入函數

必須建立輸入函數來提供用於訓練、評估和預測的數據。

輸入函數是返回 tf.data.Dataset 對象的函數，此對象會輸出下列含有兩個元素的元組：

features - Python 字典，其中：
每一個鍵都是特徵的名稱。
每一個值都是包含此特徵全部值的數組。
label - 包含每一個樣本的標籤值的數組。
爲了向展現輸入函數的格式，請查看下面這個簡單的實現：

def input_evaluation_set():
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels

輸入函數能夠以您須要的任何方式生成 features 字典和 label 列表。不過，咱們建議使用 TensorFlow 的 Dataset API，它能夠解析各類數據。歸納來說，Dataset API 包含下列類：

各個類以下所示：

Dataset - 包含建立和轉換數據集的方法的基類。您還能夠經過該類從內存中的數據或 Python 生成器初始化數據集。
TextLineDataset - 從文本文件中讀取行。
TFRecordDataset - 從 TFRecord 文件中讀取記錄。
FixedLengthRecordDataset - 從二進制文件中讀取具備固定大小的記錄。
Iterator - 提供一次訪問一個數據集元素的方法。

Dataset API 能夠處理不少常見狀況。例如，使用 Dataset API，能夠輕鬆地從大量並行文件中讀取記錄，並將它們合併爲單個數據流。

爲了簡化此示例，使用 Pandas 加載數據，並利用此內存中的數據構建輸入管道。

如下是用於在此程序中進行訓練的輸入函數（位於 iris_data.py 中）：

def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    return dataset.shuffle(1000).repeat().batch(batch_size)

6. 定義特徵列

特徵列是一個對象，用於說明模型應該如何使用特徵字典中的原始輸入數據。在構建 Estimator 模型時，您會向其傳遞一個特徵列的列表，其中包含您但願模型使用的每一個特徵。tf.feature_column 模塊提供不少用於在模型中表示數據的選項。

對於鳶尾花問題，4 個原始特徵是數值，所以咱們會構建一個特徵列的列表，以告知 Estimator 模型將這 4 個特徵都表示爲 32 位浮點值。所以，建立特徵列的代碼以下所示：

# Feature columns describe how to use the input.
my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

特徵列可能比上述示例複雜得多。咱們將在入門指南的後面部分詳細介紹特徵列。

咱們已經介紹了但願模型如何表示原始特徵，如今能夠構建 Estimator 了。

7. 實例化 Estimator

鳶尾花問題是一個經典的分類問題。幸運的是，TensorFlow 提供了幾個預建立的分類器 Estimator，其中包括：

tf.estimator.DNNClassifier：適用於執行多類別分類的深度模型。
tf.estimator.DNNLinearCombinedClassifier：適用於寬度和深度模型。
tf.estimator.LinearClassifier：適用於基於線性模型的分類器。

對於鳶尾花問題，tf.estimator.DNNClassifier 彷佛是最好的選擇。咱們將以下所示地實例化此Estimator：

# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[10, 10],
    # The model must choose between 3 classes.
    n_classes=3)

8. 訓練、評估和預測

咱們已經有一個 Estimator 對象，如今能夠調用方法來執行下列操做

訓練模型。
評估通過訓練的模型。
使用通過訓練的模型進行預測。

訓練模型

經過調用 Estimator 的 train 方法訓練模型，以下所示：

# Train the Model.
classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size),
    steps=args.train_steps)

咱們將 input_fn 調用封裝在 lambda 中以獲取參數，同時提供一個不採用任何參數的輸入函數，正如 Estimator 預計的那樣。steps 參數告知方法在訓練多步後中止訓練。

評估通過訓練的模型

模型已通過訓練，如今咱們能夠獲取一些關於其效果的統計信息。如下代碼塊會評估通過訓練的模型對測試數據進行預測的準確率：

# Evaluate the model.
eval_result = classifier.evaluate(
    input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

與咱們對 train 方法的調用不一樣，咱們沒有傳遞 steps 參數來進行評估。咱們的 eval_input_fn 只生成一個週期的數據。

運行此代碼會生成如下輸出（或相似輸出）：

Test set accuracy: 0.967

利用通過訓練的模型進行預測（推理）

咱們已經有一個通過訓練的模型，能夠生成準確的評估結果。咱們如今可使用通過訓練的模型，根據一些無標籤測量結果預測鳶尾花的品種。與訓練和評估同樣，咱們使用單個函數調用進行預測：

# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

predictions = classifier.predict(
    input_fn=lambda:iris_data.eval_input_fn(predict_x, batch_size=args.batch_size))

predict 方法返回一個 Python 可迭代對象，爲每一個樣本生成一個預測結果字典。如下代碼輸出了一些預測及其機率：

template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print(template.format(iris_data.SPECIES[class_id], 100 * probability, expec))

運行上面的代碼會生成如下輸出：

...
Prediction is "Setosa" (99.6%), expected "Setosa"

Prediction is "Versicolor" (99.8%), expected "Versicolor"

Prediction is "Virginica" (97.9%), expected "Virginica"