TensorFlow Wide And Deep 模型詳解與應用(一)

時間 2020-01-02

標籤 tensorflow wide deep 模型詳解應用简体版

原文原文鏈接

做者簡介：汪劍，如今在出門問問負責推薦與個性化。曾在微軟雅虎工做，從事過搜索和推薦相關工做。
責編：何永燦（heyc@csdn.net）
本文首發於CSDN，未經容許不得轉載。c++

Wide and deep 模型是 TensorFlow 在 2016 年 6 月左右發佈的一類用於分類和迴歸的模型，並應用到了 Google Play 的應用推薦中 [1]。wide and deep 模型的核心思想是結合線性模型的記憶能力（memorization）和 DNN 模型的泛化能力（generalization），在訓練過程當中同時優化 2 個模型的參數，從而達到總體模型的預測能力最優。算法

結合咱們的產品應用場景同 Google Play 的推薦場景存在較多的相似之處，在通過調研和評估後，咱們也將 wide and deep 模型應用到產品的推薦排序模型，並搭建了一套線下訓練和線上預估的系統。鑑於網上對 wide and deep 模型的相關描述和講解並非特別多，咱們將這段時間對 TensorFlow1.1 中該模型的調研和相關應用經驗分享出來，但願對相關使用人士帶來幫助。跨域

wide and deep 模型的框架在原論文的圖中進行了很好的概述。wide 端對應的是線性模型，輸入特徵能夠是連續特徵，也能夠是稀疏的離散特徵，離散特徵之間進行交叉後能夠構成更高維的離散特徵。線性模型訓練中經過 L1 正則化，可以很快收斂到有效的特徵組合中。deep 端對應的是 DNN 模型，每一個特徵對應一個低維的實數向量，咱們稱之爲特徵的 embedding。DNN 模型經過反向傳播調整隱藏層的權重，而且更新特徵的 embedding。wide and deep 整個模型的輸出是線性模型輸出與 DNN 模型輸出的疊加。數組

如原論文中提到的，模型訓練採用的是聯合訓練（joint training），模型的訓練偏差會同時反饋到線性模型和 DNN 模型中進行參數更新。相比於 ensemble learning 中單個模型進行獨立訓練，模型的融合僅在最終作預測階段進行，joint training 中模型的融合是在訓練階段進行的，單個模型的權重更新會受到 wide 端和 deep 端對模型訓練偏差的共同影響。所以在模型的特徵設計階段，wide 端模型和 deep 端模型只須要分別專一於擅長的方面，wide 端模型經過離散特徵的交叉組合進行 memorization，deep 端模型經過特徵的 embedding 進行 generalization，這樣單個模型的大小和複雜度也能獲得控制，而總體模型的性能仍能獲得提升。app

圖 1 Wide and deep 模型示意圖框架

Wide And Deep 模型定義

定義 wide and deep 模型是比較簡單的，tutorial 中提供了比較完整的模型構建實例：ide

獲取輸入

模型的輸入是一個 python 的 dataframe。如 tutorial 的實例代碼，能夠經過 pandas.read_csv 從 CSV 文件中讀入數據構建 data frame。函數

定義 feature columns

tf.contrib.layers 中提供了一系列的函數定義不一樣類型的 feature columns：性能

tf.contrib.layers.sparse_column_with_XXX 構建低維離散特徵
sparse_feature_a = sparse_column_with_hash_bucket(…)
sparse_feature_b = sparse_column_with_hash_bucket(…)
tf.contrib.layers.crossed_column 構建離散特徵的組合
sparse_feature_a_x_sparse_feature_b = crossed_column([sparse_feature_a, sparse_feature_b], …)
tf.contrib.layers.real_valued_column 構建連續型實數特徵
real_feature_a = real_valued_column(…)
tf.contrib.layers.embedding_column 構建 embedding 特徵
sparse_feature_a_emb = embedding_column(sparse_id_column=sparse_feature_a, )

定義模型

定義分類模型：

m = tf.contrib.learn.DNNLinearCombinedClassifier(
 n_classes = n_classes, // 分類數目
 weight_column_name = weight_column_name, // 訓練實例的權重
 model_dir = model_dir, // 模型目錄
 linear_feature_columns = wide_columns, // 輸入線性模型的 feature columns
 linear_optimizer = tf.train.FtrlOptimizer(...), // 線性模型權重更新的 optimizer
 dnn_feature_columns = deep_columns, // 輸入 DNN 模型的 feature columns
 dnn_hidden_units=[100, 50]，// DNN 模型的隱藏層單元數目
 dnn_optimizer=tf.train.AdagradOptimizer(...) // DNN 模型權重更新的 optimizer
 )

須要指出的是：模型的 model_dir 同下面會提到的 export 模型的目錄是 2 個不一樣的目錄，model_dir 存放模型的 graph 和 summary 數據，若是 model_dir 存放了上一次訓練的模型數據，訓練時會從 model_dir 恢復上一次訓練的模型並在此基礎上進行訓練。咱們用 tensorboard 加載顯示的模型數據也是從該目錄下生成的。模型 export 的目錄則主要是用於 tensorflow server 啓動時加載模型的 servable 實例，用於線上預測服務。

若是要使用迴歸模型，能夠以下定義：

m = tf.contrib.learn.DNNLinearCombinedRegressor(
 weight_column_name = weight_column_name,
 linear_feature_columns = wide_columns, 
 linear_optimizer = tf.train.FtrlOptimizer(...), 
 dnn_feature_columns = deep_columns, 
 dnn_hidden_units=[100, 50]，
 dnn_optimizer=tf.train.AdagradOptimizer(...)
 )

訓練評測

訓練模型可使用 fit 函數：m.fit(input_fn=input_fn(df_train))，評測使用 evaluate 函數：m.evaluate(input_fn=input_fn(df_test))。Input_fn 函數定義如何從輸入的 dataframe 構建特徵和標記：

def input_fn(df)
 // tf.constant 構建 constant tensor，df[k].values 是對應 feature column 的值構成的 list
 continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS}

 // tf.SparseTensor 構建 sparse tensor，SparseTensor 由 indices,values, dense_shape 三
 // 個 dense tensor 構成，indices 中記錄非零元素在 sparse tensor 的位置，values 是
 // indices 中每一個位置的元素的值，dense_shape 指定 sparse tensor 中每一個維度的大小
 // 如下代碼爲每一個 category column 構建一個 [df[k].size，1] 的二維的 SparseTensor。
 categorical_cols = { 
 k: tf.SparseTensor( indices=[[i, 0] for i in range(df[k].size)],
 values=df[k].values,
 dense_shape=[df[k].size, 1])
 for k in CATEGORICAL_COLUMNS
 }
 // 能夠用如下示意圖來表示以上代碼構建的 sparse tensor

// label 是一個 constant tensor，記錄每一個實例的 label
 label = tf.constant(df[LABEL_COLUMN].values)

 // features 是 continuous_cols 和 categorical_cols 的 union 構成的 dict
 // dict 中每一個 entry 的 key 是 feature column 的 name，value 是 feature column 值的 tensor
 return features, label

輸出

模型經過 export 輸出到一個指定目錄，tensorflow serving 從該目錄加載模型提供在線預測服務：m.export(export_dir=export_dir,input_fn = export._default_input_fn
use_deprecated_input_fn=True,signature_fn=signature_fn)
input_fn 函數定義生成模型 servable 實例的特徵，signature_fn 函數定義模型輸入輸出的 signature。
因爲在 TensorFlow1.0 以後 export 已經 deprecate，須要用 export_savedmodel 來替代，因此本文就不對 export 進行更多講解，只在文末給出咱們是如何使用它的，建議全部使用者之後切換到最新的 API。

模型詳解

wide and deep 模型是基於 TF.learn API 來實現的，其源代碼實現主要在 tensorflow.contrib.learn.python.learn.estimators 中。以分類模型爲例，wide 與 deep 結合的分類模型對應的類是 DNNLinearCombinedClassifier，實如今源文件 dnn_linear_combined.py。咱們先看看 DNNLinearCombinedClassifier 的初始化函數的完整定義，看構造一個 wide and deep 模型能夠輸入哪些參數：

def __init__(self, model_dir=None, n_classes=2, weight_column_name=None, linear_feature_columns=None,
 linear_optimizer=None, joint_linear_weights=False, dnn_feature_columns=None, 
 dnn_optimizer=None, dnn_hidden_units=None, dnn_activation_fn=nn.relu, dnn_dropout=None,
 gradient_clip_norm=None, enable_centered_bias=False, config=None,
 feature_engineering_fn=None, embedding_lr_multipliers=None):

咱們能夠將類的構造函數中的參數分爲如下幾組

基礎參數

model_dir
咱們訓練的模型存放到 model_dir 指定的目錄中。若是咱們須要用 tensorboard 來 DEBUG 模型，將 tensorboard 的 logdir 指向該目錄便可：tensorboard –logdir=$model_dir
n_classes
分類數。默認是二分類，>2 則進行多分類。
weight_column_name
定義每一個訓練樣本的權重。訓練時每一個訓練樣本的訓練偏差乘以該樣本的權重而後用於權重更新梯度的計算。若是須要爲每一個樣本指定權重，input_fn 返回的 features 裏須要包含一個以 weight_column_name 爲列名的列，該列的長度爲訓練樣本的數目，列中每一個元素對應一個樣本的權重，數據類型是 float，如如下僞代碼：

weight = tf.constant(df[WEIGHT_COLUMN_NAME].values, dtype=float32);
 features[weight_column_name] = weight

config
指定運行時配置參數
eature_engineering_fn
對輸入函數 input_fn 輸出的 (features, label) 進行後處理生成新的 (features』, label』) 而後輸入給模型訓練函數 model_fn 使用。

call_model_fn():
 feature, labels = self._feature_engineering_fn(feature, labels)

線性模型相關參數

linear_feature_columns
線性模型的輸入特徵
linear_optimizer
線性模型的優化函數，定義權重的梯度更新算法，默認採用 FTRL。全部默認支持的 linear_optimizer 和 dnn_optimizer 能夠在 optimizer.py 的 OPTIMIZER_CLS_NAMES 變量中找到相關定義。
join_linear_weights
按照代碼中的註釋，若是 join_linear_weights= true，線性模型的權重會存放在一個 tf.Variable 中，能夠加快訓練，可是 linear_feature_columns 中的特徵列必須都是 sparse feature column 而且每一個 feature column 的 combiner 必須是「sum」。通過本身線下的對比試驗，對模型的預測能力彷佛沒有太大影響，對訓練速度有所提高，最終訓練模型時咱們保持了默認值。

DNN 模型相關參數

dnn_feature_columns
DNN 模型的輸入特徵
dnn_optimizer
DNN 模型的優化函數，定義各層權重的梯度更新算法，默認採用 Adagrad。
dnn_hidden_units
每一個隱藏層的神經元數目
dnn_activation_fn
隱藏層的激活函數，默認採用 RELU
dnn_dropout
模型訓練中隱藏層單元的 drop_out 比例
gradient_clip_norm
定義 gradient clipping，對梯度的變化範圍作出限制，防止 gradient vanishing 或 gradient explosion。wide and deep 中默認採用 tf.clip_by_global_norm。
embedding_lr_multipliers
embedding_feature_column 到 float 的一個 mapping。對指定的 embedding feature column 在計算梯度時乘以一個常數因子，調整梯度的變化速率。

看完模型的構造函數後，咱們大概知道 wide 和 deep 端的模型各對應什麼樣的模型，模型須要輸入什麼樣的參數。爲了更深刻了解模型，如下咱們對 wide and deep 模型的相關代碼進行了分析，力求解決以下疑問： (1) 分別用於線性模型和 DNN 模型訓練的特徵是如何定義的，其內部如何實現；(2) 訓練中線性模型和 DNN 模型如何進行聯合訓練，訓練偏差如何反饋給 wide 模型和 deep 模型？下面咱們重點針對特徵和模型訓練這兩方面進行解讀。

特徵

wide and deep 模型訓練通常是以多個訓練樣本做爲 1 個批次 (batch) 進行訓練，訓練樣本在行維度上定義，每一行對應一個訓練樣本實例，包括特徵（feature column），標註（label）以及權重（weight），如圖 2。特徵在列維度上定義，每一個特徵對應 1 個 feature column，feature column 由在列維度上的 1 個或者若干個張量 (tensor) 組成，tensor 中的每一個元素對應一個樣本在該 feature column 上某個維度的值。feature column 的定義在能夠在源代碼的 feature_column.py 文件中找到，對應類爲_FeatureColumn，該類定義了基本接口，是 wide and deep 模型中全部特徵類的抽象父類。

圖 2 feature_column, label, weight 示意圖

wide and deep 模型中使用的特徵包括兩大類：一類是連續型特徵，主要用於 deep 模型的訓練，包括 real value 類型的特徵以及 embedding 類型的特徵等；一類是離散型特徵，主要用於 wide 模型的訓練，包括 sparse 類型的特徵以及 cross 類型的特徵等。如下是全部特徵的一個彙總圖

圖 3 wide and deep 模型特徵類圖

圖中類與類的關係除了 inherit（繼承）以外，同時咱們也標出了特徵類之間的構成關係：_BucketizedColumn 由_RealValueColumn 經過對連續值域進行分桶構成，_CrossedColumn 由若干_SparseColumn 或者_BucketizedColumn 或者_CrossedColumn 通過交叉組合構成。圖中左邊部分特徵屬於離散型特徵，右邊部分特徵屬於連續型特徵。

咱們在實際使用的時候，一般狀況下是調用 TensorFlow 提供的接口來構建特徵的。如下是構建各種特徵的接口：

sparse_column_with_integerized_feature() --> _SparseColumnIntegerized

sparse_column_with_hash_bucket() --> _SparseColumnHashed

sparse_column_with_keys() --> _SparseColumnKeys

sparse_column_with_vocabulary_file() --> _SparseColumnVocabulary

weighted_sparse_column() --> _WeightedSparseColumn

one_hot_column() --> _OneHotColumn

embedding_column() --> _EmbeddingColumn

shared_embedding_columns() --> List[_EmbeddingColumn]

scattered_embedding_column() --> _ScatteredEmbeddingColumn

real_valued_column() --> _RealValuedColumn

bucketized_column() -->_BucketizedColumn

crossed_column() --> _CrossedColumn

FeatureColumn 爲模型訓練定義了幾個基本接口用於提取和轉換特徵，在後面講解具體 feature 時會有具體描述：

def insert_transformed_feature(self, columns_to_tensors):
「」「Apply transformation and inserts it into columns_to_tensors.
FeatureColumn 的特徵輸出和轉換函數。columns_to_tensor 是 FeatureColumn 到 tensors 的映射。
def _to_dnn_input_layer(self, input_tensor, weight_collection=None, trainable=True, output_rank=2):
「」「Returns a Tensor as an input to the first layer of neural network.」「」
構建 DNN 的 float tensor 輸入，參見後面對 RealValuedColumn 的講解。
def _deep_embedding_lookup_arguments(self, input_tensor):
「」「Returns arguments to embedding lookup to build an input layer.」「」
構建 DNN 的 embedding 輸入，參見後面對 EmbeddingColumn 的講解。
def _wide_embedding_lookup_arguments(self, input_tensor):
「」「Returns arguments to look up embeddings for this column.」「」
構建線性模型的輸入，參見後面對 SparseColumn 的講解。

咱們從離散型的特徵（sparse 特徵）開始分析。離散型特徵能夠看作由若干鍵值構成的特徵，好比用戶的性別。在實際實現中，每個鍵值在 sparse column 內部對應一個整數 id。離散特徵的基類是_SparseColumn：

class _SparseColumn(_FeatureColumn,
 collections.namedtuple("_SparseColumn",
 ["column_name", "is_integerized",
 "bucket_size", "lookup_config",
 "combiner", "dtype"])):

collections.namedtuple 中的字符串數組是_SparseColumn 從對應的建立接口函數中接收的輸入參數的名稱。

def __new__(cls,
 column_name,
 is_integerized=False,
 bucket_size=None,
 lookup_config=None,
 combiner="sum",
 dtype=dtypes.string):

SparseFeature 是如何存放這些離散取值的呢？這個跟 bucket_size 和 lookup_config 這兩個參數相關。在實際定義中，有且只定義其中一個參數。經過使用哪個參數咱們能夠把 sparse feature 分紅兩類，定義 lookup_config 參數的特徵使用一個 in memory 的字典存儲 feature 的全部取值，包括後面會講到的_SparseColumnKeys，_SparseColumnVocabulary；定義 bucket_size 參數的特徵使用一個哈希表來存儲特徵值，特徵值經過哈希函數散列到各個桶，包括_SparseColumnHashed 和_SparseColumnIntegerized(is_integerized = True)。

dtype 指定特徵值的類型，除了字符串類型 (dtypes.string）以外，spare feature column 還支持 64 位整數類型（dtypes.int64），默認咱們認爲輸入的離散特徵是字符串，若是咱們定義了 is_integerized = True，那麼咱們認爲特徵是一個整型的 id 型特徵，咱們能夠直接用特徵的取值做爲特徵的 id，而不須要創建一個專門的映射。

combiner 參數對應的是樣本維度特徵的歸一化，若是特徵列在單個樣本上有多個取值，combiner 參數指定如何對單個樣本上特徵的多個取值進行歸一化。源代碼註釋中是這樣寫的：「combiner： A string specifying how to reduce if the sparse column is multivalent」，multivalent 的具體含義在 crossed feature column 的定義中有一個稍微清楚的解釋（combiner: A string specifying how to reduce if there are multiple entries in a single row）。combiner 能夠指定 3 種歸一化方式：sum 對應無歸一化，sqrtn 對應 L2 歸一化，mean 對應 L1 歸一化。一般狀況下采用 L2 歸一化，模型的準確度相對會更高。

SparseColumn 不能直接做爲 DNN 的輸入，它只能用於直接構建線性模型的輸入：

def _wide_embedding_lookup_arguments(self, input_tensor):
 return _LinearEmbeddingLookupArguments( input_tensor=self.id_tensor(input_tensor),
 weight_tensor=self.weight_tensor(input_tensor),
 vocab_size=self.length,
 initializer=init_ops.zeros_initializer(),
 combiner=self.combiner)

_LinearEmbeddingLookupArguments 是一個 namedtuple（A new subclass of tuple with named fields）。input_tensor 是訓練樣本集中特徵的 id 構成的數組，weight_tensor 中每一個元素對應一個樣本中該特徵的權重，vocab_size 是特徵取值的個數，intiializer 是特徵初始化的函數，默認初始化爲 0。

不過看源代碼中_SparseColumn 及其子類並無使用特徵權重：

def weight_tensor(self, input_tensor):
 """Returns the weight tensor from the given transformed input_tensor."""
 return None

若是須要爲_SparseColumn 的特徵賦予權重，可使用_WeightedSparseColumn，構造接口函數爲 weighted_sparse_column（Create a _SparseColumn by combing sparse_id_column and weight_column）

class _WeightedSparseColumn(_FeatureColumn, collections.namedtuple(
 "_WeightedSparseColumn",["sparse_id_column", "weight_column_name", "dtype"])):

 def __new__(cls, sparse_id_column, weight_column_name, dtype):
 return super(_WeightedSparseColumn, cls).__new__(cls, sparse_id_column, weight_column_name, dtype)

_WeightedSparseColumn 須要 3 個參數：sparse_id_column 對應 sparse feature column，是_SparseColumn 類型的對象，weight_column_name 爲輸入中對應 sparse_id_column 的 weight column（input_fn 返回的 features dict 中須要有一個 weight_column_name 的 tensor）dtype 是 weight column 中每一個元素的數據類型。這裏有幾個隱含要求：

（1）dtype 須要可以轉換成浮點數類型，不然會拋 TypeError；
（2）weight_column_name 對應的 weight column 能夠是一個 SparseTensor，也能夠是一個常規的 dense tensor，程序會將 dense tensor 轉換成 SparseTensor，可是要求 weight column 最終對應的 SparseTensor 與 sparse_id_column 的 SparseTensor 有相同的索引 (indices) 和維度 (dense_shape)。

_WeightedSparseColumn 輸出特徵的 id tensor 和 weight tensor 的函數以下：

def insert_transformed_feature(self, columns_to_tensors):
 """Inserts a tuple with the id and weight tensors."""
 if self.sparse_id_column not in columns_to_tensors:
 self.sparse_id_column.insert_transformed_feature(columns_to_tensors)

 weight_tensor = columns_to_tensors[self.weight_column_name]
 if not isinstance(weight_tensor, sparse_tensor_py.SparseTensor):
 # The weight tensor can be a regular Tensor. In such case, sparsify it.
 // 咱們輸入的 weight tensor 能夠是一個常規的 Tensor，如經過 tf.Constants 構建的 tensor，
 // 這種狀況下，會調用 dense_to_sparse_tensor 將 weight_tensor 轉換成 SparseTensor。
 weight_tensor = contrib_sparse_ops.dense_to_sparse_tensor(weight_tensor)

 // 最終使用的 weight_tensor 的數據類型是 float
 if not self.dtype.is_floating:
 weight_tensor = math_ops.to_float(weight_tensor)

 // 返回中對應該 WeightedSparseColumn 的一個二元組，二元組的第一個元素是 SparseFeatureColumn 調用 
 // insert_transformed_feature 後的 id_tensor，第二個元素是 weight tensor。
 columns_to_tensors[self] = tuple([columns_to_tensors[self.sparse_id_column],weight_tensor])

def id_tensor(self, input_tensor):
 """Returns the id tensor from the given transformed input_tensor."""
 return input_tensor[0]

def weight_tensor(self, input_tensor):
 """Returns the weight tensor from the given transformed input_tensor."""
 return input_tensor[1]

（1）sparse column from keys

這個是最簡單的離散特徵，類比於枚舉類型，通常用於枚舉的值不是太多的狀況。建立基於 keys 的 sparse 特徵的接口是 sparse_column_with_keys(column_name, keys, default_value=-1, combiner=None)，對應類是 SparseColumnKeys，構造函數爲：

def __new__(cls, column_name, keys, default_value=-1, combiner="sum"):
 return super(_SparseColumnKeys, cls).__new__(cls, column_name, combiner=combiner,
 lookup_config=_SparseIdLookupConfig(keys=keys, vocab_size=len(keys),
 default_value=default_value), dtype=dtypes.string)

keys 爲一個字符串列表，定義了全部的枚舉值。構造特徵輸入的 keys 最後存儲在 lookup_config 裏面，每一個 key 的類型是 string，而且對應 1 個 id，id 是該 key 在輸入的 keys 數組中的下標。在模型實際訓練中使用的是每一個 key 對應的 id。

SparseColumnKeys 輸入到模型前須要將枚舉值的 key 轉換到相應的 id，這個轉換工做在函數 insert_transformed_feature 中實現：

def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 input_tensor = self._get_input_sparse_tensor(columns_to_tensors)
 """"Returns a lookup table that converts a string tensor into int64 IDs.This operation constructs a lookup table 
 to convert tensor of strings into int64 IDs. The mapping can be initialized from a string `mapping` 1-D 
 tensor where each element is a key and corresponding index within the tensor is the
 value.
 """
 table = lookup.index_table_from_tensor(mapping=tuple(self.lookup_config.keys),
 default_value=self.lookup_config.default_value, dtype=self.dtype, name="lookup")
 columns_to_tensors[self] = table.lookup(input_tensor)

（2）sparse column from vocabulary file

sparse column with keys 通常枚舉都能知足，若是枚舉的值多了就不合適了，因此提供了一個從文件加載枚舉變量的接口：

sparse_column_with_vocabulary_file((column_name, vocabulary_file, num_oov_buckets=0, vocab_size=None,
default_value=-1, combiner="sum",dtype=dtypes.string)

對應的構造函數爲：

def __new__(cls, column_name, vocabulary_file, num_oov_buckets=0, vocab_size=None, default_value=-1,
 combiner="sum", dtype=dtypes.string):

那麼從文件中讀入的特徵值是存哪裏呢？看看這個構造函數最後返回的類實例：

return super(_SparseColumnVocabulary, cls).__new__(cls, column_name,combiner=combiner,
lookup_config=_SparseIdLookupConfig(vocabulary_file=vocabulary_file,num_oov_buckets=num_oov_buckets,
vocab_size=vocab_size,default_value=default_value), dtype=dtype)

如同_SparseColumnKeys，這個特徵也使用了_SparseIdLookupConfig 來存儲特徵值，vocabulary_file 指向定義枚舉值的文件，vocabulary_file 每一行對應一個枚舉值，每一個枚舉值的 id 是該枚舉值所在行號（注意，行號是從 0 開始的），vocab_size 定義枚舉值的個數。_SparseIdLookupConfig 從特徵文件中構建一個特徵值到 id 的哈希表，咱們看看 SparseColumnVocabulary 進行 vocabulary 到 id 的轉換時如何使用_SparseIdLookupConfig 對象。

def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 st = self._get_input_sparse_tensor(columns_to_tensors)
 if self.dtype.is_integer:
 // 輸入的整數數值型特徵轉換成字符串形式
 sparse_string_values = string_ops.as_string(st.values)
 sparse_string_tensor = sparse_tensor_py.SparseTensor(st.indices,sparse_string_values, st.dense_shape)
 else:
 sparse_string_tensor = st

 """Returns a lookup table that converts a string tensor into int64 IDs.This operation constructs a lookup table 
 to convert tensor of strings into int64 IDs. The mapping can be initialized from a vocabulary file specified in
 `vocabulary_file`, where the whole line is the key and the zero-based line number is the ID.
 table = lookup.index_table_from_file(vocabulary_file=self.lookup_config.vocabulary_file, 
 num_oov_buckets=self.lookup_config.num_oov_buckets,vocab_size=self.lookup_config.vocab_size,
 default_value=self.lookup_config.default_value, name=self.name + "_lookup")
 columns_to_tensors[self] = table.lookup(sparse_string_tensor)

index_table_from_file 函數從 lookup_config 的字典文件中構建 table。Table 變量是一個 string 到 int64 的 HashTable，若是定義了 num_oov_buckets，table 是 IdTableWithHashBuckets 對象（a string to id wrapper that assigns out-of-vocabulary keys to buckets）。

（3）sparse column with hash bucket

若是沒有 vocab 文件定義枚舉特徵，咱們可使用 hash bucket 特徵，使用該特徵的接口是
sparse_column_with_hash_bucket(column_name, hash_bucket_size, combiner=None,dtype=dtypes.string)
對應類_SparseColumnHashed 的構造函數爲：def new(cls, column_name, hash_bucket_size, combiner=」sum」, dtype=dtypes.string):

ash_bucket_size 定義哈希桶的個數，用於哈希值取模。dtype 支持整數和字符串。實際計算哈希值的時候是將整數轉換成對應的字符串表示形式，用字符串計算哈希值而後取模，轉換後的特徵值是 0 到 hash_bucket_size 的一個整數。

def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 input_tensor = self._get_input_sparse_tensor(columns_to_tensors)
 if self.dtype.is_integer:
 // 整數類型的輸入轉換成字符串類型
 sparse_values = string_ops.as_string(input_tensor.values)
 else:
 sparse_values = input_tensor.values

 sparse_id_values = string_ops.string_to_hash_bucket_fast(sparse_values, self.bucket_size, name="lookup")

 // Sparse 特徵的哈希值做爲特徵值對應的 id 返回
 columns_to_tensors[self] = sparse_tensor_py.SparseTensor(input_tensor.indices, sparse_id_values,
 input_tensor.dense_shape)

（4）integerized sparse column

hash bucket 的 sparse 特徵取哈希值的時候是將整數看作字符串處理的，若是咱們但願用整數自己的數值做爲哈希值，可使用_SparseColumnIntegerized，對應的接口是

sparse_column_with_integerized_feature：
 def sparse_column_with_integerized_feature(column_name,hash_bucket_size,combiner="sum",
 dtype=dtypes.int64)
對應的類是_SparseColumnIntegerized： 
def __new__(cls, column_name, bucket_size, combiner="sum", dtype=dtypes.int64)
特徵的轉換函數定義：
def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 input_tensor = self._get_input_sparse_tensor(columns_to_tensors)

 // 直接對特徵值取模，取模後的值做爲特徵值的 id
 sparse_id_values = math_ops.mod(input_tensor.values, self.bucket_size, name="mod")
 columns_to_tensors[self] = sparse_tensor_py.SparseTensor( input_tensor.indices, sparse_id_values, 
 input_tensor.dense_shape)

（5）crossed column

Crossed column 支持 1 個以上的離散型 feature column 進行笛卡爾積，組成高維度的交叉特徵。特徵之間進行交叉，能夠將特徵之間的相關性引入模型，加強模型的表達能力。crossed column 僅支持如下 3 種離散特徵的交叉組合： _SparsedColumn, _BucketizedColumn 和_CrossedColumn，其接口定義爲：

def crossed_column(columns,hash_bucket_size, combiner=」sum」,ckpt_to_load_from=None,
 tensor_name_in_ckpt=None, hash_key=None)
對應類爲_CrossedColumn：
def __new__(cls, columns,hash_bucket_size,hash_key, combiner="sum",ckpt_to_load_from=None, 
 tensor_name_in_ckpt=None):

columns 對應一個 feature column 的集合，如 tutorial 中的例子：[age_buckets, education, occupation]；hash_bucket_size 參數指定 hash bucket 的桶個數，特徵交叉的組合個數越多，hash_bucket_size 也應相應增長，從而減少哈希衝突。

交叉特徵生成模型輸入的邏輯能夠分爲以下兩步：

def insert_transformed_feature(self, columns_to_tensors):
 """Handles cross transformation."""
 def _collect_leaf_level_columns(cross):
 """Collects base columns contained in the cross."""
 leaf_level_columns = []
 for c in cross.columns:
 // 對 CrossedColumn 類型的 feature column 進行遞歸展開
 if isinstance(c, _CrossedColumn):
 leaf_level_columns.extend(_collect_leaf_level_columns(c))
 else:
 // SparseColumn 和 BucketizedColumn 做爲葉子節點
 leaf_level_columns.append(c)
 return leaf_level_columns

 // 步驟 1： 將 crossed column 中的全部特徵進行遞歸展開，展開後的特徵值存放在 feature_tensors 數組中

 feature_tensors = []
 for c in _collect_leaf_level_columns(self):
 if isinstance(c, _SparseColumn):
 feature_tensors.append(columns_to_tensors[c.name])
 else:
 if c not in columns_to_tensors:
 c.insert_transformed_feature(columns_to_tensors)
 if isinstance(c, _BucketizedColumn):
 feature_tensors.append(c.to_sparse_tensor(columns_to_tensors[c]))
 else:
 feature_tensors.append(columns_to_tensors[c])

// 步驟 2: 生成 cross feature 的 tensor，sparse_feature_cross 經過動態庫調用 SparseFeatureCross 函數，函數接
//口可參見 sparse_feature_cross_op.cc
 columns_to_tensors[self] = sparse_feature_cross_op.sparse_feature_cross(feature_tensors, 
 hashed_output=True,num_buckets=self.hash_bucket_size,hash_key=self.hash_key, name="cross")

在源代碼該部分的註釋中有一個例子說明 feature column 進行 cross 後的效果，咱們用 1 個圖來將這部分註釋展現的更明確點：

圖 4 feature column 進行 cross 後的效果圖

須要指出的一點是：交叉特徵是沒有權重定義的。

對離散特徵進行交叉組合在預測模型中使用比較普遍，可是該類特徵的一個侷限性是它對訓練數據中沒有見過的特徵組合泛化能力有限，後面咱們談到的 embedding column 則是經過構建離散特徵的低維向量表示，強化離散特徵的泛化能力。

（6）real valued column

real valued feature column 對應連續型數值特徵，接口爲

real_valued_column(column_name, dimension=1, default_value=None, dtype=dtypes.float32,normalizer=None):

對應類爲_RealValuedColumn：

_RealValuedColumn(column_name, dimension, default_value, dtype,normalizer)

dimension 指定 feature column 的維度，默認值爲 1，即 1 維浮點數數組。dimension 也能夠取大於 1 的整數，對應多維數組。rea valued column 的特徵取值類型能夠是 float32 或者 int，int 類型在輸入到模型以前會轉換成 float 類型。normalizer 定義在一批訓練樣本實例中，特徵在列維度的歸一化，至關於 column-level normalization。這個同 sparse feature column 的 combiner 不一樣，combiner 定義的是離散特徵在單個樣本維度的歸一化（example-level normalization），如下示意圖舉了個例子來講明二者的區別：

圖 5 combiner 與 normalizer 的區別

normalizer 在 real valued feature column 輸入 DNN 時調用：

def insert_transformed_feature(self, columns_to_tensors):
 # Transform the input tensor according to the normalizer function.
 // _normalized_input_tensor 調用的是構造 real valued colum 時傳入的 normalizer 函數
 input_tensor = self._normalized_input_tensor(columns_to_tensors[self.name])
 columns_to_tensors[self] = math_ops.to_float(input_tensor)

real valued column 調用_to_dnn_input_layer 轉換爲 DNN 的輸入。_to_dnn_input_layer 生成一個二維數組，數組的每一行是一個訓練樣本的 real valued column 的特徵值，該特徵值與其餘連續型特徵拼接後構成 DNN 的輸入層。

def _to_dnn_input_layer(self,input_tensor,weight_collections=None,trainable=True,output_rank=2):
 // DNN 的輸入必須是 dense tensor，sparse tensor 須要調用 to_dense_tensor 轉換成 dense tensor
 input_tensor = self._to_dense_tensor(input_tensor)
 if input_tensor.dtype != dtypes.float32:
 input_tensor = math_ops.to_float(input_tensor)

 // 調用 dense_inner_flatten(input_tensor, output_rank)。
 // output_rank = 2，輸出 [batch_size, real value column』s input dimension]
 return _reshape_real_valued_tensor(input_tensor, output_rank, self.name)

def _to_dense_tensor(self, input_tensor):
 if isinstance(input_tensor, sparse_tensor_py.SparseTensor):
 default_value = (self.default_value[0] if self.default_value is not None else 0)
 // Sparse tensor 轉換成 dense tensor
 return sparse_ops.sparse_tensor_to_dense(input_tensor, default_value=default_value)
 // real valued column 直接返回 input tensor
 return input_tensor

（7）bucketized column

連續型特徵經過 bucketization 生成離散特徵，連續特徵離散化的優勢在網上有一些相關討論，好比餐館的距離對用戶選擇的影響，咱們一般會將距離劃分爲若干個區間，如 100 米之內，1 千米之內等，這樣小幅度的距離差別不會對咱們最終模型的預測形成太大影響，除非距離差別跨域了區間邊界。bucketized column 的接口定義爲：def bucketized_column(source_column, boundaries) 對應類爲_BucketizedColumn，構造函數定義：def new(cls, source_column, boundaries):source_column 必須是 real_valued_column，boundaries 是一個浮點數的列表，並且列表必須是遞增序的，好比 boundaries = [0, 100, 200] 定義瞭如下一組區間：（-INF，0），[0，100），[100，200），[200, INF)。

def insert_transformed_feature(self, columns_to_tensors):
 # Bucketize the source column.
 if self.source_column not in columns_to_tensors:
 self.source_column.insert_transformed_feature(columns_to_tensors)
 columns_to_tensors[self] = bucketization_op.bucketize(columns_to_tensors[self.source_column],
 boundaries=list(self.boundaries), name="bucketize")

bucketize 函數調用 tensorflow c++ core library 中的 BucketizeOp 類完成 feature 的 bucketization 功能。

（8）embedding column

sparse feature column 經過 embedding 轉換成連續型向量後能夠做爲 deep model 的輸入，前面談到了 cross column 的一個不足之處是在測試集合的泛化能力，經過 embedding column 將離散特徵連續化，根據標註學習特徵的向量形式，如同矩陣分解中學習物品的隱含因子向量或者詞向量模型中單詞的詞向量。embedding column 的接口形式是：

def embedding_column(sparse_id_column, dimension, combiner=None, initializer=None, 
 ckpt_to_load_from=None,tensor_name_in_ckpt=None, max_norm=None, trainable=True)
對應類爲_EmbeddingColumn：
def __new__(cls,sparse_id_column,dimension,combiner="mean",initializer=None, ckpt_to_load_from=None,
 tensor_name_in_ckpt=None,shared_embedding_name=None, shared_vocab_size=None,max_norm=None,
 trainable = True):

sparse_id_column 是 SparseColumn 對象或者 WeightedSparseColumn 對象，dimension 是 embedding column 的向量維度。SparseColumn 的每一個特徵取值對應一個整數 id，該整數 id 在 embedding column 中對應一個 dimension 維度的浮點數向量。combiner 參數指定在單個樣本上對特徵向量歸一化的方式，initializer 參數指定特徵向量的初始化函數，默認按 truncated normal distribution 初始化 (mean = 0, stddev = 1/ sqrt(length of sparse id column))。max_norm 限定每一個樣本特徵向量作 L2 歸一化後的最大值：embedding_vector = embedding_vector * max_norm / L2_norm(embedding_vector)。

爲了進一步理解 embedding column，咱們能夠畫一個簡易圖：

圖 6 embedding feature column 示意圖

如上圖，以 sparse_column_with_keys(column_name = 『gender』, keys = [『female』, 『male』]) 爲例，假設 female 對應 id = 0, male 對應 id = 1，每一個 id 在 embedding feature 中對應 1 個 6 維的浮點數向量。在實際訓練數據中，當 gender 特徵取值爲』female』時，給到 DNN 輸入層的將是 id = 0 對應的向量（tf.embedding_lookup_sparse）。embedding_column 設置了一個 trainable 參數，指定是否根據模型訓練偏差更新特徵對應的 embedding。

embedding 特徵的變換函數：

def insert_transformed_feature(self, columns_to_tensors):
 if self.sparse_id_column not in columns_to_tensors:
 self.sparse_id_column.insert_transformed_feature(columns_to_tensors)
 columns_to_tensors[self] = columns_to_tensors[self.sparse_id_column]

def _deep_embedding_lookup_arguments(self, input_tensor):
 return _DeepEmbeddingLookupArguments(
 input_tensor=self.sparse_id_column.id_tensor(input_tensor),
 // sparse_id_column 爲_SparseColumn 類型的對象時，weight_tensor = None
 // sparse_id_column 爲_WeightedSparseColumn 類型對象時，weight_tensor = WeihgtedSparseColumn 的
 // weight tensor，weight_tensor 須知足：
 // 1）weight_tensor.indices = input_tensor.indices
 // 2）weight_tensor.shape = input_tensor.shape
 weight_tensor=self.sparse_id_column.weight_tensor(input_tensor),
 // sparse feature column 的元素個數
 vocab_size=self.length,
 // embedding 的維度
 dimension=self.dimension,
 // embedding 的初始化函數
 initializer=self.initializer,
 // embedding 的行歸一化方法
 combiner=self.combiner,
 shared_embedding_name=self.shared_embedding_name,
 hash_key=None,
 max_norm=self.max_norm,
 trainable=self.trainable)

從_DeepEmbeddingLookupArguments 產生 sparse feature 的 embedding 的邏輯在函數_embeddings_from_arguments 實現:

def _embeddings_from_arguments(column, args, weight_collections,trainable, output_rank=2):
 // column 對應 embedding feature column 的 name，args 是 feature column 對應的
 // _DeepEmbeddingLookupArguments 對象，weight_collections 存儲 embedding 的權重，
 // output_rank 指定輸出 embedding 的 tensor 的 rank。

 input_tensor = layers._inner_flatten(args.input_tensor, output_rank)
 weight_tensor = layers._inner_flatten(args.weight_tensor, output_rank)

 // 考慮默認狀況下構建 embedding: args.hash_key is None, args.shared_embedding_name is None

 // 獲取或建立 embedding 的 model variable
 // embeddings 是 [number of sparse feature id, embedding dimension] 的浮點數二維數組
 // 每行對應一個 sparse feature id 的 embedding
 embeddings = contrib_variables.model_variable( name='weights'，shape=[args.vocab_size, 
 args.dimension], dtype=dtypes.float32,initializer=args.initializer,
 // If trainable, embedding vector 做爲一個 model variable 添加到 GraphKeys.TRAINABLE_VARIABLES 
 trainable=(trainable and args.trainable),
 collections=weight_collections // weight_collections 存儲每一個 feature id 的 weight
 )

 // 獲取每一個 sparse feature id 的 embedding
 return embedding_ops.safe_embedding_lookup_sparse(embeddings, input_tensor,
 sparse_weights=weight_tensor, combiner=args.combiner, name=column.name + 'weights',
 max_norm=args.max_norm)

safe_embedding_lookup_sparse 調用 tf.embedding_lookup_sparse 獲取每一個 sparse feature id 的 embedding。
tf.embedding_lookup_sparse 首先調用 tf.embedding_lookup 獲取 sparse feature id 的 embedding vector:

// sp_ids 是 input_tensor 的 id tensor
ids = sp_ids.values

embeddings = embedding_lookup (
 // params 對應 embeddings 矩陣，每一個元素是 embedding_dimension 的 float tensor，能夠將 params 看
 // 作一個 embedding tensor 的 partitions，partition 的策略由 partition_strategy 指定
 params, 
 // ids 對應 input_tensor 的 values 數組
 ids,
 // id 分配到 params 的分配策略，有 mod 和 div 兩種，默認 mod，具體定義可參見 tf.embedding_lookup 的說明
 partition_strategy=partition_strategy, 
 // 限制 embedding 的最大 L2-Norm
 max_norm=max_norm
 )

若是 sparse_weights 不是 None，embedding 的值乘以 weights，
weights = sparse_weights.values
embeddings *= weights

根據 combiner，對 embedding 進行歸一化

segment_id = sp_ids.indices[;0]
 if combiner == "sum":
 // No normalization
 embeddings = math_ops.segment_sum(embeddings, segment_ids, name=name)
 elif combiner == "mean":
 // L1 normlization: embeddings = SUM(embeddings * weight) / SUM(weight)
 embeddings = math_ops.segment_sum(embeddings, segment_ids)
 weight_sum = math_ops.segment_sum(weights, segment_ids)
 embeddings = math_ops.div(embeddings, weight_sum, name=name)
 elif combiner == "sqrtn":
 // L2 normalization: embeddings = SUM(embeddings * weight^2) / SQRT(SUM(weight^2))
 embeddings = math_ops.segment_sum(embeddings, segment_ids)
 weights_squared = math_ops.pow(weights, 2)
 weight_sum = math_ops.segment_sum(weights_squared, segment_ids)
 weight_sum_sqrt = math_ops.sqrt(weight_sum)
 embeddings = math_ops.div(embeddings, weight_sum_sqrt, name=name)

（9）其餘 feature columns

除了以上列舉的幾個 feature column，TensorFlow 還支持 one hot column，shared embedding column 和 scattered embedding column。one hot column 對 sparse feature column 進行 one-hot 編碼，若是離散特徵的取值較少，能夠用 one hot feature column 進行編碼用於 DNN 的訓練。不一樣於 embedding column，one hot feature column 不支持經過模型訓練來更新其特徵的 embedding。shared embedding column 和 scattered embedding column 因爲篇幅緣由就很少談了。

前面講了模型輸入的特徵，第二部分談談模型自己：
TensorFlow Wide And Deep 模型詳解與應用（二）