TensorFlow從1到2（五）圖片內容識別和天然語言語義識別

時間 2020-05-08

標籤 tensorflow 圖片內容識別天然語言語義简体版

原文原文鏈接

Keras內置的預約義模型

上一節咱們講過了完整的保存模型及其訓練完成的參數。
Keras中使用這種方式，預置了多個著名的成熟神經網絡模型。固然，這實際是Keras的功勞，並不適合算在TensorFlow 2.0頭上。
當前TensorFlow 2.0-alpha版本捆綁的Keras中包含：python

densenet
inception_resnet_v2
inception_v3
mobilenet
mobilenet_v2
nasnet
resnet50
vgg16
vgg19
xception

這些模型都已經使用大規模的數據訓練完成，能夠上手即用，實爲良心佳做、碼農福利。
在《從鍋爐工到AI專家(8)》文中，咱們演示了一個使用vgg19神經網絡識別圖片內容的例子。那段代碼並不難，可是使用TensorFlow 1.x的API構建vgg19這種複雜的神經網絡可說費勁不小。有興趣的讀者能夠移步至原文再體會一下那種糾結。git

而如今再作一樣的事則是再簡單不過了，你徹底能夠在你同事去茶水間倒咖啡的時間完成一個全功能的可用代碼。好比跟上文功能相同的代碼以下：數據庫

#!/usr/bin/env python3

import tensorflow as tf
from tensorflow import keras
# 載入vgg19模型
from tensorflow.keras.applications import vgg19
from tensorflow.keras.preprocessing import image
import numpy as np
import argparse

# 用於保存命令行參數
FLAGS = None

# 初始化vgg19模型，weights參數指的是使用ImageNet圖片集訓練的模型
# 每種模型第一次使用的時候都會自網絡下載保存的h5文件
# vgg19的數據文件約爲584M
model = vgg19.VGG19(weights='imagenet')


def main(imgPath):
	# 載入命令行參數指定的圖片文件, 載入時變形爲224x224，這是模型規範數據要求的
    img = image.load_img(imgPath, target_size=(224, 224))
	# 將圖片轉換爲(224,224,3)數組，最後的3是由於RGB三色彩圖
    img = image.img_to_array(img)
	# 跟前面的例子同樣，使用模型進行預測是批處理模式，
	# 因此對於單個的圖片，要擴展一維成爲（1,224,224,3)這樣的形式
	# 至關於創建一個預測隊列，但其中只有一張圖片
    img = np.expand_dims(img, axis=0)
	# 使用模型預測（識別）
    predict_class = model.predict(img)
	# 獲取圖片識別可能性最高的3個結果
    desc = vgg19.decode_predictions(predict_class, top=3)
	# 咱們的預測隊列中只有一張圖片，因此結果也只有第一個有效，顯示出來
    print(desc[0])

if __name__ == '__main__':
	# 命令行參數處理
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--image_file', type=str, default='pics/bigcat.jpeg',
                        help='Pic file name')
    FLAGS, unparsed = parser.parse_known_args()
    main(FLAGS.image_file)

Keras庫載入圖片文件的代碼間接引用了pillow庫，因此程序執行前請先安裝：pip3 install pillow。
仍然使用原文中的圖片嘗試識別：
api

$ ./pic-recognize.py -i pics/bigcat.jpeg 
[('n02128385', 'leopard', 0.9778516), ('n02130308', 'cheetah', 0.008372171), ('n02128925', 'jaguar', 0.007467962)]

結果表示，圖片是leopard(美洲豹)的可能性爲97.79%，是cheetah(獵豹)的可能性爲0.84%，是jaguar(美洲虎)的可能性爲0.75%。數組

使用這種方式，在圖片識別中，換用其餘網絡模型很是輕鬆，只須要替換程序中的三條語句，好比咱們將模型換爲resnet50：bash

模型引入，由：
from tensorflow.keras.applications import vgg19
替換爲：
from tensorflow.keras.applications import resnet50

模型構建，由：
model = vgg19.VGG19(weights='imagenet')
替換爲：
model = resnet50.ResNet50(weights='imagenet')
注意第一次運行的時候，一樣會下載resnet50的h5文件，這須要不短期。  

顯示預測結果，由：
    desc = vgg19.decode_predictions(predict_class, top=3)
替換爲：
    desc = resnet50.decode_predictions(predict_class, top=3)

由於模型不一樣，執行結果會有細微區別，但這種久經考驗的成熟網絡，識別正確性沒有問題：網絡

$ ./pic-recognize.py -i pics/bigcat.jpeg 
[('n02128385', 'leopard', 0.8544763), ('n02128925', 'jaguar', 0.09733019), ('n02128757', 'snow_leopard', 0.040557403)]

天然語義識別

相似這樣的功能集成、數據預處理工做在TensorFlow 2.0中增長了不少，對技術人員是極大的方便。好比在《從鍋爐工到AI專家(9)》一文中，咱們介紹了NLP項目重要的預處理工做：單詞向量化。
在Keras中，單詞向量化已經標準化爲了模型中的一層。固化的同時，使用的自由度也很高，能夠在代碼中控制須要編碼的單詞數量和向量化的維度以及不少其它參數。詳細的文檔能夠看官方文檔。
單詞數字化的相關知識，咱們後面一篇也會介紹。app

本例中，咱們來看一個TensorFlow 2.0教程中的例子，天然語義識別。
程序使用IMDB影片點評樣本集做爲訓練數據。數據集的下載、載入和管理，咱們使用tensorflow_datasets工具包。因此首先要安裝一下：ide

$ pip3 install tfds-nightly

IMDB數據集包括影評和標註兩個部分：影評就是摘選的關於影片的評論，是一段英文文字；標註只有0或者1兩個數字。0表示本條影評對影片評價低，認爲電影很差看，是負面情緒。1則表示本條影評對電影評價高，認爲是好看的電影，是正面情緒。
惋惜是英文的數據集。若是想作相似的中文語義分析工做，須要咱們本身配合優秀的分詞工具來完成。
咱們使用的IMDB的數據集已經預先完成了單詞數字化的工做，也就是已經由整數編碼表明單詞。因此配合的，必須有編碼表來對應使用，才能還原原始的評論文字。
下面咱們在Python命令行使用交互模式，來看一下原始數據的樣子：函數

$ python3
Python 3.7.3 (default, Mar 27 2019, 09:23:39) 
[Clang 10.0.0 (clang-1000.11.45.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
# 引入TensorFlow數據集處理工具
>>> import tensorflow_datasets as tfds
# 載入簡化版訓練樣本數據集，簡化版只包含8000+單詞，這能讓訓練過程快一點，
# 完整版則包含幾萬
>>> dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True,
...                           as_supervised=True)
# 數據集中已經劃分好了訓練數據集和測試數據集
>>> train_dataset, test_dataset = dataset['train'], dataset['test']
# 初始化單詞編碼對照表，用於一下子還原數字數組到影評文字
>>> tokenizer = info.features['text'].encoder
# 顯示一條原始數據，是一個數字數組及一個單獨的數字
# 前者是已經編碼的影評，後者是標註
>>> for i in train_dataset.take(1):
...     print(i[0], i[1])
... 
tf.Tensor(
[ 768   99  416    9  733    1  626    6  467  159   33  788   53   29
 1224    3  156  155 1234 2492   14   32  151 7968   40  193   31  303
 7976   59 4159  104    3   12  258 2674  551 5557   40   44  113   55
  143  121   83   35 1151   11  195   13  746   61   55  300    3 3075
 8044   38   66   54    9    4  355  811   23 1406 6481 7961 1060 6786
  409 3570 7411 3743 2314 7998 8005 1782    3   19  953    9 5922 8029
    3   12  207 7968   21  582   72 8002 7968  123  853  178  132 1527
    3   19 1575   29 1288 2847 2742 8029    3   19  188    9  715 7974
 7753   26  144    1  263   85   33  479  892    3 1566 1380    7 1929
 4887 7961 3760   47 4584  204   88  183  800 1160    5   42    9 6396
   20 1838   24   10   16   10   17   19  349  233    9    1 5845  432
    6   15  208    3   69    9   20   75    1 1876  574   61    6   79
  141    7  115   15   51   20  785   20 3374    3 1976 1515 7968    8
  171   29 7463  104    2 5114    5  569    6 2203   95  185   52 5374
  376  231    5  789   47 7514   11 2246  714    2 7779   49 1709 1877
    4    5   19 3583 3599 7961    7 1302  146    6    1 1871    3  128
   11    1 2674  194 3754  100 7974  267    6  405   68   29 1966 5928
  291    7 2862  488   52 2048  858  700 1532   28 1551    2  142 7968
    8  638  152    1 2246 2968  739  251   19 3712 1183  830 1379 5368
   47    5 1889 7974 4038   34 4636   52 3653 6991   34 4491 8029 7975], shape=(280,), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)
# 顯示一條還原的影評和標註
>>> for i in train_dataset.take(1):
...     print(tokenizer.decode(i[0]), i[1].numpy())
... 
Just because someone is under the age of 10 does not mean they are stupid. If your child likes this film you'd better have him/her tested. I am continually amazed at how so many people can be involved in something that turns out so bad. This "film" is a showcase for digital wizardry AND NOTHING ELSE. The writing is horrid. I can't remember when I've heard such bad dialogue. The songs are beyond wretched. The acting is sub-par but then the actors were not given much. Who decided to employ Joey Fatone? He cannot sing and he is ugly as sin.<br /><br />The worst thing is the obviousness of it all. It is as if the writers went out of their way to make it all as stupid as possible. Great children's movies are wicked, smart and full of wit - films like Shrek and Toy Story in recent years, Willie Wonka and The Witches to mention two of the past. But in the continual dumbing-down of American more are flocking to dreck like Finding Nemo (yes, that's right), the recent Charlie & The Chocolate Factory and eye-crossing trash like Red Riding Hood. 0
# 影評部分很少說，標註部分是數字0，表示這是一條負面評價

NLP類項目，一般多用RNN、LSTM、GRU網絡。主要緣由是一條文本，單詞數並不肯定，雖然能夠作補足(Padding)，但使用一般神經網絡效果並很差。此外文本中各單詞之間是有相關性的，這相似圖片中的相鄰點之間的相關，但文本的相關性跨度更大。
關於RNN/LSTM/GRU的原理咱們在《從鍋爐工到AI專家(10)》一文中已經有過介紹。這裏再也不重複，直接進入代碼部分，經過註釋來理解所作的工做：

#!/usr/bin/env python3

from __future__ import absolute_import, division, print_function

# 引入tensorflow數據集工具包
import tensorflow_datasets as tfds
# 引入tensorflow
import tensorflow as tf

# 加載數據集，第一次會須要從網上下載imdb數據庫
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True,
                          as_supervised=True)
# 將訓練集和測試集分別賦予兩個變量                          
train_dataset, test_dataset = dataset['train'], dataset['test']

# 初始化對應的文本編碼對照表
tokenizer = info.features['text'].encoder
# 顯示當前樣本集包含的全部單詞數
print('Vocabulary size: {}'.format(tokenizer.vocab_size))

BUFFER_SIZE = 10000
BATCH_SIZE = 64
# 將訓練集打亂順序
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
# 每批次的數據對齊
train_dataset = train_dataset.padded_batch(BATCH_SIZE, train_dataset.output_shapes)
test_dataset = test_dataset.padded_batch(BATCH_SIZE, test_dataset.output_shapes)

# 構造神經網絡模型
# 第一層就是將已經數字化的影評數據向量化
# 向量化在上個系列中已經講過，功能就是將單詞嵌入多維矩陣
# 並使得語義相近的單詞，在空間距離上更接近
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(tokenizer.vocab_size, 64),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(
        64, return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
# 編譯模型
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
# 訓練模型
history = model.fit(train_dataset, epochs=10,
                    validation_data=test_dataset)

# 本訓練耗時比較長，因此訓練完保存一次數據，以便之後咱們會想再次嘗試
model.save_weights('./imdb-classify-lstm/final_chkp')

# 恢復數據，若是之後想再次測試影評預測，能夠將上面訓練、保存屏蔽起來
# 而後從這裏開始使用
model.load_weights('./imdb-classify-lstm/final_chkp')
# 使用測試集數據評估模型，並顯示損失值和準確度
test_loss, test_acc = model.evaluate(test_dataset)
print('\nTest Loss: {}'.format(test_loss))
print('Test Accuracy: {}'.format(test_acc))

#########################################################
# 如下爲使用模型對一段文字進行情緒預測

# 工具函數，將一個不足指定長度的數組，使用0在尾部填充，以湊夠長度
# 咱們使用的模型嵌入層輸入序列沒有指定input_length，但這個參數是有默認值的，
# 至關於其實是定長的，補充到同嵌入矩陣相同維度的長度，準確率會更高
# 固然對於只有0、1兩個結果的分類來講，效果並不明顯
def pad_to_size(vec, size):
    zeros = [0] * (size - len(vec))
    vec.extend(zeros)
    return vec

# 對一段文字進行預測
def sample_predict(sentence, pad):
    # 輸入的文字，首先要使用imdb數據庫相同的數字、單詞對照表進行編碼
    # 對於表中沒有的單詞，還會創建新對照項
    tokenized_sample_pred_text = tokenizer.encode(sentence)
    # 補充短的文字段到定長
    if pad:
        tokenized_sample_pred_text = pad_to_size(tokenized_sample_pred_text, 64)
    # 擴展一維，使數據成爲只有1個數據的一個批次
    predictions = model.predict(tf.expand_dims(tokenized_sample_pred_text, 0))
    return (predictions)

# 預測1，文字大意：電影很差，動畫和畫面都很可怕，我不會推薦這個電影
sample_pred_text = ('The movie was not good. The animation and the graphics '
                    'were terrible. I would not recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=True)
print(predictions)

# 預測2，文字大意：電影很無聊，我不喜歡這個電影
sample_pred_text = ("The movie was boring. I don't like this movie.")
predictions = sample_predict(sample_pred_text, pad=True)
print(predictions)

# 預測3，文字大意：這個電影很贊，裏面的一切都很精緻，我喜歡它
sample_pred_text = ('The movie was great. Everything in this movies '
                    'is delicate, I love it.')
predictions = sample_predict(sample_pred_text, pad=True)
print(predictions)

這個樣例的訓練已經比較慢了，在我用的電腦使用入門級的GPU運算跑了差很少20分鐘。因此程序訓練結束的時候保存了一次模型的參數，以便之後咱們還想再測試更多的文本。
程序執行的輸出大體以下:

$ ./imdb-classify-lstm.py
Vocabulary size: 8185
Epoch 1/10
391/391 [==============================] - 117s 299ms/step - loss: 0.5763 - accuracy: 0.6985 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/10
391/391 [==============================] - 114s 292ms/step - loss: 0.4639 - accuracy: 0.7876 - val_loss: 0.5006 - val_accuracy: 0.7731
Epoch 3/10
391/391 [==============================] - 115s 295ms/step - loss: 0.3296 - accuracy: 0.8680 - val_loss: 0.3920 - val_accuracy: 0.8344
Epoch 4/10
391/391 [==============================] - 115s 295ms/step - loss: 0.2674 - accuracy: 0.8977 - val_loss: 0.3640 - val_accuracy: 0.8597
Epoch 5/10
391/391 [==============================] - 115s 295ms/step - loss: 0.2168 - accuracy: 0.9218 - val_loss: 0.3190 - val_accuracy: 0.8698
Epoch 6/10
391/391 [==============================] - 115s 294ms/step - loss: 0.1717 - accuracy: 0.9423 - val_loss: 0.3201 - val_accuracy: 0.8754
Epoch 7/10
391/391 [==============================] - 114s 293ms/step - loss: 0.1339 - accuracy: 0.9573 - val_loss: 0.3470 - val_accuracy: 0.8678
Epoch 8/10
391/391 [==============================] - 115s 294ms/step - loss: 0.1044 - accuracy: 0.9693 - val_loss: 0.4094 - val_accuracy: 0.8569
Epoch 9/10
391/391 [==============================] - 116s 296ms/step - loss: 0.0826 - accuracy: 0.9771 - val_loss: 0.4496 - val_accuracy: 0.8704
Epoch 10/10
391/391 [==============================] - 115s 295ms/step - loss: 0.0671 - accuracy: 0.9820 - val_loss: 0.4516 - val_accuracy: 0.8696
    391/Unknown - 37s 95ms/step - loss: 0.4516 - accuracy: 0.8696
Test Loss: 0.45155299115745
Test Accuracy: 0.8695999979972839
[[0.00420592]]
[[0.00562131]]
[[0.99653375]]

最終的結果，前兩個值很接近0，表示這兩句影評傾向於批評意見。第三個值接近1，表示這條影評是正面意見。注意這三條影評都是咱們即興隨意寫出的，並不是樣本庫中的數據，是真正的「天然語言」。

（待續...）

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。