前言
今天應該是推薦算法的最後一篇了,因子分解機deepFM。此處跳過了FM和FFM,由於我立刻要去幹別的了,因此直接用deepFM收尾吧。 先po兩篇論文python
看完這兩篇論文,基本就能理解FM和DeepFM了。爲了節省你們的時間我簡述一下一些基本思想。算法
FM因子分解機
在FM出現之前大多使用SVM來作CTR預估,固然還有其餘的好比SVD++,PITF,FPMC等,可是這些模型對稀疏矩陣顯得捉襟見肘,並且參數規模很大。 那FM解決了什麼問題:網絡
- 更適合作稀疏矩陣的參數計算
- 減小了須要訓練的參數規模,並且特徵和參數數量是線性關係
- FM能夠使用任何真實數據進行計算
其實FM出現主要解決了特徵之間的交叉特徵關係,此處省略了稀疏矩陣致使的w參數失效的模型直接說最終模型:app
這裏經過一個向量v的交叉來解決了稀疏矩陣致使的致使參數失效的問題。 那他參數的規模爲何小呢,接下來就是推導後面二次項部分:ui
從這裏能夠看出參數的複雜度是線性的O(kn)。url
Keras對FM建模
這裏是單純的FM模型代碼,這代碼是借鑑別人的,我發現有一個問題就是,他最後repeat了二次項,這塊我不是太明白,貼出來你們有興趣能夠一塊兒討論。spa
import os os.environ["CUDA_VISIBLE_DEVICES"]="-1" import keras.backend as K from keras import activations from keras.engine.topology import Layer, InputSpec class FMLayer(Layer): def __init__(self, output_dim, factor_order, activation=None, **kwargs): if 'input_shape' not in kwargs and 'input_dim' in kwargs: kwargs['input_shape'] = (kwargs.pop('input_dim'),) super(FMLayer, self).__init__(**kwargs) self.output_dim = output_dim self.factor_order = factor_order self.activation = activations.get(activation) self.input_spec = InputSpec(ndim=2) def build(self, input_shape): assert len(input_shape) == 2 input_dim = input_shape[1] self.input_spec = InputSpec(dtype=K.floatx(), shape=(None, input_dim)) self.w = self.add_weight(name='one', shape=(input_dim, self.output_dim), initializer='glorot_uniform', trainable=True) self.v = self.add_weight(name='two', shape=(input_dim, self.factor_order), initializer='glorot_uniform', trainable=True) self.b = self.add_weight(name='bias', shape=(self.output_dim,), initializer='zeros', trainable=True) super(FMLayer, self).build(input_shape) def call(self, inputs, **kwargs): X_square = K.square(inputs) xv = K.square(K.dot(inputs, self.v)) xw = K.dot(inputs, self.w) p = 0.5 * K.sum(xv - K.dot(X_square, K.square(self.v)), 1) rp = K.repeat_elements(K.expand_dims(p, 1), self.output_dim, axis=1) f = xw + rp + self.b output = K.reshape(f, (-1, self.output_dim)) if self.activation is not None: output = self.activation(output) return output def compute_output_shape(self, input_shape): assert input_shape and len(input_shape) == 2 return input_shape[0], self.output_dim inp = Input(shape=(np.shape(x_train)[1],)) x = FMLayer(200, 100)(inp) x = Dense(2, activation='sigmoid')(x) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))
運行結果.net
Train on 2998 samples, validate on 750 samples Epoch 1/10 2998/2998 [==============================] - 2s 791us/step - loss: 0.0580 - accuracy: 0.9872 - val_loss: 0.7466 - val_accuracy: 0.8127 Epoch 2/10 2998/2998 [==============================] - 2s 683us/step - loss: 0.0650 - accuracy: 0.9893 - val_loss: 0.7845 - val_accuracy: 0.8067 Epoch 3/10 2998/2998 [==============================] - 2s 674us/step - loss: 0.0803 - accuracy: 0.9915 - val_loss: 0.8730 - val_accuracy: 0.7960 Epoch 4/10 2998/2998 [==============================] - 2s 681us/step - loss: 0.0362 - accuracy: 0.9943 - val_loss: 0.8771 - val_accuracy: 0.8013 Epoch 5/10 2998/2998 [==============================] - 2s 683us/step - loss: 0.0212 - accuracy: 0.9953 - val_loss: 0.9035 - val_accuracy: 0.8007 Epoch 6/10 2998/2998 [==============================] - 2s 721us/step - loss: 0.0188 - accuracy: 0.9965 - val_loss: 0.9295 - val_accuracy: 0.7993 Epoch 7/10 2998/2998 [==============================] - 2s 719us/step - loss: 0.0168 - accuracy: 0.9972 - val_loss: 0.9597 - val_accuracy: 0.8007 Epoch 8/10 2998/2998 [==============================] - 2s 693us/step - loss: 0.0150 - accuracy: 0.9973 - val_loss: 0.9851 - val_accuracy: 0.7993 Epoch 9/10 2998/2998 [==============================] - 2s 677us/step - loss: 0.0137 - accuracy: 0.9972 - val_loss: 1.0114 - val_accuracy: 0.7987 Epoch 10/10 2998/2998 [==============================] - 2s 684us/step - loss: 0.0126 - accuracy: 0.9977 - val_loss: 1.0361 - val_accuracy: 0.8000
FFM算法
上面FM算法也能夠看出來只是有一組特徵,可是現實生活中可能會有多組特徵,例如論文中舉例:code
此處包含了用戶,電影,用戶打分的其餘電影,時間信息等,因此光是一組特徵的交叉還不夠,可能涉及到不一樣組特徵的交叉。因此FFM應運而生。此處不詳細介紹,直接說deepFM。orm
DeepFM算法
DeepFM同樣我就不詳細介紹了,不明白的本身看上面論文,我直說重點。
DeepFM的優勢
- 結合了FM和DNN,結合了高階特徵建模DNN和低階特徵建模FM
- DeepFM低階部分和高階部分共享了相同的特徵,讓計算更有效率
- DeepFM在CTR預測中效果最好
DeepFM網絡結構
FM部分網絡結構
FM部分就是把一次項和二次項結合到一塊兒就好
# 一次項 fm_w_1 = Activation('linear')(K.dot(inputs, self.fm_w)) # 二次項 dot_latent = {} dot_cat = [] for i in range(1, self.num_fields): print(self.num_fields) dot_latent[i] = {} for f in self.field_combinations[i]: print(len(self.field_combinations)) dot_latent[i][f] = Dot(axes=-1, normalize=False)([latent[i], latent[f]]) dot_cat.append(dot_latent[i][f]) print(dot_cat) fm_w_2 = Concatenate()(dot_cat) # Merge 一次和二次項獲得FM fm_w = Concatenate()([fm_w_1, fm_w_2])
DNN部分網絡結構
DNN部分比較簡單
# 加倆隱藏層 deep1 = Activation('relu')(K.bias_add(K.dot(ConcatLatent,self.h1_w),self.h1_b)) deep2 = Activation('relu')(K.bias_add(K.dot(deep1, self.h2_w), self.h2_b))
總體網絡結構
這裏先作Embeding,而後給DNN和FM提供數據,代碼以下:
# 不一樣fields組合 for i in range(1, self.num_fields + 1): sparse[i] = Lambda(lambda x: x[:, self.id_start[i]:self.id_stop[i]], output_shape=((self.len_field[i],)))(inputTensor) latent[i] = Activation('linear')(K.bias_add(K.dot(sparse[i],self.embed_w[i]),self.embed_b[i])) # merge 不一樣 field ConcatLatent = Concatenate()(list(latent.values()))
DeepFM代碼
總體代碼以下:
from keras.layers import Dense, Concatenate, Lambda, Add, Dot, Activation from keras.engine.topology import Layer from keras import backend as K class DeepFMLayer(Layer): def __init__(self, embeddin_size, field_len_group=None, **kwargs): self.output_dim = 1 self.embedding_size = 10 self.input_spec = InputSpec(ndim=2) self.field_count = len(field_len_group) self.num_fields = len(field_len_group) self.field_lengths = field_len_group self.embed_w = {} self.embed_b = {} self.h1 = 10 self.h2 = 10 def start_stop_indices(field_lengths, num_fields): len_field = {} id_start = {} id_stop = {} len_input = 0 for i in range(1, num_fields + 1): len_field[i] = field_lengths[i - 1] id_start[i] = len_input len_input += len_field[i] id_stop[i] = len_input return len_field, len_input, id_start, id_stop self.len_field, self.len_input, self.id_start, self.id_stop = \ start_stop_indices(self.field_lengths,self.num_fields) def Field_Combos(num_fields): field_list = list(range(1, num_fields)) combo = {} combo_count = 0 for idx, field in enumerate(field_list): sub_list = list(range(field + 1, num_fields + 1)) combo_count += len(sub_list) combo[field] = sub_list return combo, combo_count self.field_combinations, self.combo_count = Field_Combos(self.num_fields) print(field_len_group) print(self.field_combinations) print(self.num_fields) super(DeepFMLayer, self).__init__(**kwargs) def build(self, input_shape): assert len(input_shape) == 2 total_embed_size = 0 self.input_spec = InputSpec(dtype=K.floatx(), shape=(None, input_shape[1])) for i in range(1, self.num_fields + 1): input_dim = self.len_field[i] _name = "embed_W" + str(i) self.embed_w[i] = self.add_weight(shape=(input_dim, self.embedding_size), initializer='glorot_uniform', name=_name) _name = "embed_b" + str(i) self.embed_b[i] = self.add_weight(shape=(self.embedding_size,), initializer='zeros', name=_name) total_embed_size += self.embedding_size self.fm_w = self.add_weight(name='fm_w', shape=(input_shape[1], 1), initializer='glorot_uniform', trainable=True) self.h1_w = self.add_weight(shape=(total_embed_size, self.h1), initializer='glorot_uniform', name='h1_w') self.h1_b = self.add_weight(shape=(self.h1,), initializer='zeros', name='h1_b') self.h2_w = self.add_weight(shape=(self.h1, self.h2), initializer='glorot_uniform', name='h2_W') self.h2_b = self.add_weight(shape=(self.h2,), initializer='zeros', name='h2_b') self.w = self.add_weight(name='w', shape=(self.h2, 1), initializer='glorot_uniform', trainable=True) self.b = self.add_weight(name='b', shape=(1,), initializer='zeros', trainable=True) super(DeepFMLayer, self).build(input_shape) def call(self, inputs): latent = {} sparse = {} # 不一樣fields組合 for i in range(1, self.num_fields + 1): sparse[i] = Lambda(lambda x: x[:, self.id_start[i]:self.id_stop[i]], output_shape=((self.len_field[i],)))(inputTensor) latent[i] = Activation('linear')(K.bias_add(K.dot(sparse[i],self.embed_w[i]),self.embed_b[i])) # merge 不一樣 field ConcatLatent = Concatenate()(list(latent.values())) # 加倆隱藏層 deep1 = Activation('relu')(K.bias_add(K.dot(ConcatLatent,self.h1_w),self.h1_b)) deep2 = Activation('relu')(K.bias_add(K.dot(deep1, self.h2_w), self.h2_b)) # 一次項 fm_w_1 = Activation('linear')(K.dot(inputs, self.fm_w)) # 二次項 dot_latent = {} dot_cat = [] for i in range(1, self.num_fields): print(self.num_fields) dot_latent[i] = {} for f in self.field_combinations[i]: print(len(self.field_combinations)) dot_latent[i][f] = Dot(axes=-1, normalize=False)([latent[i], latent[f]]) dot_cat.append(dot_latent[i][f]) print(dot_cat) fm_w_2 = Concatenate()(dot_cat) # Merge 一次和二次項獲得FM fm_w = Concatenate()([fm_w_1, fm_w_2]) fm = Lambda(lambda x: K.sum(x, axis=1, keepdims=True))(fm_w) deep_wx = Activation('linear')(K.bias_add(K.dot(deep2, self.w),self.b)) print('build finish') return Add()([fm, deep_wx]) def compute_output_shape(self, input_shape): return (input_shape[0], self.output_dim) inputTensor = Input(shape=(np.shape(x_train)[1],)) deepFM_out = DeepFMLayer(10, {0:10, 1:10, 2:np.shape(x_train)[1]-20})(inputTensor) out = Dense(2, activation="sigmoid", trainable=True)(deepFM_out) model = Model(inputTensor, out) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))
運行結果:
Train on 2998 samples, validate on 750 samples Epoch 1/10 2998/2998 [==============================] - 1s 425us/step - loss: 0.6403 - accuracy: 0.6344 - val_loss: 0.5578 - val_accuracy: 0.7533 Epoch 2/10 2998/2998 [==============================] - 1s 226us/step - loss: 0.4451 - accuracy: 0.8721 - val_loss: 0.4727 - val_accuracy: 0.8240 Epoch 3/10 2998/2998 [==============================] - 1s 215us/step - loss: 0.2469 - accuracy: 0.9445 - val_loss: 0.5188 - val_accuracy: 0.8360 Epoch 4/10 2998/2998 [==============================] - 1s 200us/step - loss: 0.1319 - accuracy: 0.9678 - val_loss: 0.6488 - val_accuracy: 0.8233 Epoch 5/10 2998/2998 [==============================] - 1s 211us/step - loss: 0.0693 - accuracy: 0.9843 - val_loss: 0.7755 - val_accuracy: 0.8247 Epoch 6/10 2998/2998 [==============================] - 1s 225us/step - loss: 0.0392 - accuracy: 0.9932 - val_loss: 0.9234 - val_accuracy: 0.8187 Epoch 7/10 2998/2998 [==============================] - 1s 204us/step - loss: 0.0224 - accuracy: 0.9967 - val_loss: 1.0437 - val_accuracy: 0.8200 Epoch 8/10 2998/2998 [==============================] - 1s 190us/step - loss: 0.0163 - accuracy: 0.9972 - val_loss: 1.1618 - val_accuracy: 0.8173 Epoch 9/10 2998/2998 [==============================] - 1s 190us/step - loss: 0.0106 - accuracy: 0.9980 - val_loss: 1.2746 - val_accuracy: 0.8147 Epoch 10/10 2998/2998 [==============================] - 1s 213us/step - loss: 0.0083 - accuracy: 0.9987 - val_loss: 1.3395 - val_accuracy: 0.8167 Model: "model_19"
結論
從上面結果上看其實純FM和DeepFM的在個人數據集上結果差很少,甚至跟LR也沒太大差異,多是由於個人數據集比較小。有問題能夠一塊兒討論,但願你們好好看看論文。以上都是個人一些拙見,還望拋磚引玉。