xDeepFM用改良的DCN替代了DeepFM的FM部分來學習組合特徵信息,而FiBiNET則是應用SENET加入了特徵權重比NFM,AFM更進了一步。在看兩個model前建議對DeepFM, Deep&Cross, AFM,NFM都有簡單瞭解,不熟悉的能夠看下文章最後其餘model的博客連接。html
如下代碼針對Dense輸入更容易理解模型結構,針對spare輸入的代碼和完整代碼 👇
https://github.com/DSXiangLi/CTRpython
看xDeepFM的名字和DeepFM類似都擁有Deep和Linear的部分,只不過把DeepFM中用來學習二階特徵交互的FM部分替換成了CIN(Compressed Interactino Network)。而CIN是在Deep&Cross的DCN上進一步改良的獲得。總體模型結構以下git
咱們重點看下CIN的部分,和paper的notation保持一致,有m個特徵,每一個特徵Embedding是D維,第K層的CIN有\(H_k\)個unit。CIN第K層的計算分爲3個部分分別對應圖a-c:github
CIN每一層的計算如上,T層CIN每一層都是上一次層的輸出和第一層的輸入進行交互獲得更高一階的交互信息。假設每層維度同樣\(H_k=H\), CIN 部分總體時間複雜度是\(O(TDmH^2)\),空間複雜度來自每層的Filter權重\(O(TmH^2)\)app
CIN保留DCN的任意高階和參數共享,兩個主要差異是框架
CIN的設計仍是很巧妙滴,不過。。。吐槽小分隊上線: CIN不管是時間複雜度仍是空間複雜度都比DCN要高,感受更容易過擬合。至於說vector-wise的向量乘積要比bit-wise的向量乘積要好,這。。。至少bit-wise能夠不限制embedding維度一致, 但vector-wise嘛我實在有些理解無能,明白的童鞋能夠comment一下ide
def cross_op(xk, x0, layer_size_prev, layer_size_curr, layer, emb_size, field_size): # Hamard product: ( batch * D * HK-1 * 1) * (batch * D * 1* H0) -> batch * D * HK-1 * H0 zk = tf.matmul( tf.expand_dims(tf.transpose(xk, perm = (0, 2, 1)), 3), tf.expand_dims(tf.transpose(x0, perm = (0, 2, 1)), 2)) zk = tf.reshape(zk, [-1, emb_size, field_size * layer_size_prev]) # batch * D * HK-1 * H0 -> batch * D * (HK-1 * H0) add_layer_summary('zk_{}'.format(layer), zk) # Convolution with channel = HK: (batch * D * (HK-1*H0)) * ((HK-1*H0) * HK)-> batch * D * HK kernel = tf.get_variable(name = 'kernel{}'.format(layer), shape = (field_size * layer_size_prev, layer_size_curr)) xkk = tf.matmul(zk, kernel) xkk = tf.transpose(xkk, perm = [0,2,1]) # batch * HK * D add_layer_summary( 'Xk_{}'.format(layer), xkk ) return xkk def cin_layer(x0, cin_layer_size, emb_size, field_size): cin_output_list = [] cin_layer_size.insert(0, field_size) # insert field dimension for input with tf.variable_scope('Cin_component'): xk = x0 for layer in range(1, len(cin_layer_size)): with tf.variable_scope('Cin_layer{}'.format(layer)): # Do cross xk = cross_op(xk, x0, cin_layer_size[layer-1], cin_layer_size[layer], layer, emb_size, field_size ) # batch * HK * D # sum pooling on dimension axis cin_output_list.append(tf.reduce_sum(xk, 2)) # batch * HK return tf.concat(cin_output_list, axis=1) @tf_estimator_model def model_fn_dense(features, labels, mode, params): dense_feature, sparse_feature = build_features() dense_input = tf.feature_column.input_layer(features, dense_feature) sparse_input = tf.feature_column.input_layer(features, sparse_feature) # Linear part with tf.variable_scope('Linear_component'): linear_output = tf.layers.dense( sparse_input, units=1 ) add_layer_summary( 'linear_output', linear_output ) # Deep part dense_output = stack_dense_layer( dense_input, params['hidden_units'], params['dropout_rate'], params['batch_norm'], mode, add_summary=True ) # CIN part emb_size = dense_feature[0].variable_shape.as_list()[-1] field_size = len(dense_feature) embedding_matrix = tf.reshape(dense_input, [-1, field_size, emb_size]) # batch * field_size * emb_size add_layer_summary('embedding_matrix', embedding_matrix) cin_output = cin_layer(embedding_matrix, params['cin_layer_size'], emb_size, field_size) with tf.variable_scope('output'): y = tf.concat([dense_output, cin_output,linear_output], axis=1) y = tf.layers.dense(y, units= 1) add_layer_summary( 'output', y ) return y
看FiBiNET前能夠先了解下Squeeze-and-Excitation Network,感興趣能夠看下這篇博客Squeeze-and-Excitation Networks。工具
FiBiNET的主要創新是應用SENET學習每一個特徵的重要性,加權獲得新的Embedding矩陣。在FiBiNET以前,AFM,PNN,DCN和上面的xDeepFM都是在特徵交互以後才用attention, 加權等方式學習特徵交互的權重,而FiBiNET在保留這部分的同時,在Embedding部分就考慮特徵自身的權重。模型結構以下學習
原始Embedding,和通過SENET調整過權重的新Embedding,在Bilinear-interaction層學習二階交互特徵,拼接後,再通過MLP進一步學習高階特徵。和paper notation保持一致(啊啊啊你們能不能統一下notation搞的我本身看本身的註釋都蒙圈),f個特徵,k維embeddingui
SENET層學習每一個特徵的權重對Embedding進行加權,分爲如下3步
在收入數據集上進行嘗試,r=2時會有46%的embedding特徵權重爲0,因此SENET會在特徵交互前先過濾部分對target無用的特徵來增長有效特徵的權重
做者提出內積和element-wise乘積都不足以捕捉特徵交互信息,所以進一步引入權重W,如下面的方式進行特徵交互
其中W有三種選擇,能夠全部特徵交互共享一個權重矩陣(Field-All),或者每一個特徵和其餘特徵的交互共享權重(Field-Each), 再或者每一個特徵交互一個權重(Field-Interaction) 具體的優劣感受須要casebycase來試,不過通常仍是照着數據越少參數越少的邏輯來整。
原始Embedding和調整權重後的Embedding在Bilinear-Interaction學習交互特徵後,拼接成shallow 層,再通過全鏈接層來學習更高階的特徵交互。後面的屬於常規操做這裏就再也不細說。
咱們不去吐槽FiBiNET能夠加入wide&deep框架來捕捉低階特徵信息和任意高階信息,更多把FiBiNET提供的SENET特徵權重的思路放到本身的工具箱中就好。
def Bilinear_layer(embedding_matrix, field_size, emb_size, type, name): # Bilinear_layer: combine inner and element-wise product interaction_list = [] with tf.variable_scope('BI_interaction_{}'.format(name)): if type == 'field_all': weight = tf.get_variable( shape=(emb_size, emb_size), initializer=tf.truncated_normal_initializer(), name='Bilinear_weight_{}'.format(name) ) for i in range(field_size): if type == 'field_each': weight = tf.get_variable( shape=(emb_size, emb_size), initializer=tf.truncated_normal_initializer(), name='Bilinear_weight_{}_{}'.format(i, name) ) for j in range(i+1, field_size): if type == 'field_interaction': weight = tf.get_variable( shape=(emb_size, emb_size), initializer=tf.truncated_normal_initializer(), name='Bilinear_weight_{}_{}_{}'.format(i,j, name) ) vi = tf.gather(embedding_matrix, indices = i, axis =1, batch_dims =0, name ='v{}'.format(i)) # batch * emb_size vj = tf.gather(embedding_matrix, indices = j, axis =1, batch_dims =0, name ='v{}'.format(j)) # batch * emb_size pij = tf.matmul(tf.multiply(vi,vj), weight) # bilinear : vi * wij \odot vj interaction_list.append(pij) combination = tf.stack(interaction_list, axis =1 ) # batch * emb_size * (Field_size * (Field_size-1)/2) combination = tf.reshape(combination, shape = [-1, int(emb_size * (field_size * (field_size-1) /2)) ]) # batch * ~ add_layer_summary( 'bilinear_output', combination ) return combination def SENET_layer(embedding_matrix, field_size, emb_size, pool_op, ratio): with tf.variable_scope('SENET_layer'): # squeeze embedding to scaler for each field with tf.variable_scope('pooling'): if pool_op == 'max': z = tf.reduce_max(embedding_matrix, axis=2) # batch * field_size * emb_size -> batch * field_size else: z = tf.reduce_mean(embedding_matrix, axis=2) add_layer_summary('pooling scaler', z) # excitation learn the weight of each field from above scaler with tf.variable_scope('excitation'): z1 = tf.layers.dense(z, units = field_size//ratio, activation = 'relu') a = tf.layers.dense(z1, units= field_size, activation = 'relu') # batch * field_size add_layer_summary('exciitation weight', a ) # re-weight embedding with weight with tf.variable_scope('reweight'): senet_embedding = tf.multiply(embedding_matrix, tf.expand_dims(a, axis = -1)) # (batch * field * emb) * ( batch * field * 1) add_layer_summary('senet_embedding', senet_embedding) # batch * field_size * emb_size return senet_embedding @tf_estimator_model def model_fn_dense(features, labels, mode, params): dense_feature, sparse_feature = build_features() dense_input = tf.feature_column.input_layer(features, dense_feature) sparse_input = tf.feature_column.input_layer(features, sparse_feature) # Linear part with tf.variable_scope('Linear_component'): linear_output = tf.layers.dense( sparse_input, units=1 ) add_layer_summary( 'linear_output', linear_output ) field_size = len(dense_feature) emb_size = dense_feature[0].variable_shape.as_list()[-1] embedding_matrix = tf.reshape(dense_input, [-1, field_size, emb_size]) # SENET_layer to get new embedding matrix senet_embedding_matrix = SENET_layer(embedding_matrix, field_size, emb_size, pool_op = params['pool_op'], ratio= params['senet_ratio']) # combination layer & BI_interaction BI_org = Bilinear_layer(embedding_matrix, field_size, emb_size, type = params['bilinear_type'], name = 'org') BI_senet = Bilinear_layer(senet_embedding_matrix, field_size, emb_size, type = params['bilinear_type'], name = 'senet') combination_layer = tf.concat([BI_org, BI_senet] , axis =1) # Deep part dense_output = stack_dense_layer(combination_layer, params['hidden_units'], params['dropout_rate'], params['batch_norm'], mode, add_summary=True ) with tf.variable_scope('output'): y = dense_output + linear_output add_layer_summary( 'output', y ) return y
https://github.com/DSXiangLi/CTR
CTR學習筆記&代碼實現1-深度學習的前奏 LR->FFM
CTR學習筆記&代碼實現2-深度ctr模型 MLP->Wide&Deep
CTR學習筆記&代碼實現3-深度ctr模型 FNN->PNN->DeepFM
CTR學習筆記&代碼實現4-深度ctr模型 NFM/AFM
CTR學習筆記&代碼實現5-深度ctr模型 DeepCrossing -> Deep&Cross
Ref