這一篇咱們從基礎的深度ctr模型談起。我很喜歡Wide&Deep的框架感受以後不少改進均可以歸入這個框架中。Wide負責樣本中出現的頻繁項挖掘,Deep負責樣本中未出現的特徵泛化。然後續的改進要麼用不一樣的IFC讓Deep更有效的提取特徵交互信息,要麼是讓Wide更好的記憶樣本信息html
如下代碼針對Dense輸入感受更容易理解模型結構,其餘針對spare輸入的模型和完整代碼 👇
https://github.com/DSXiangLi/CTRpython
點擊率模型最初在深度學習上的嘗試是從簡單的MLP開始的。把高維稀疏的離散特徵作Embedding處理,而後把Embedding拼接做爲MLP的輸入,通過多層全聯接神經網絡的非線性變換獲得對點擊率的預測。git
不知道你是否也像我同樣困惑過,這個Embedding+MLP究竟學到了什麼信息?MLP的Embedding和FM的Embedding學到的是一樣的特徵交互信息麼?最近從大神那裏聽到一個蠻有說服力的觀點,固然keep skeptical,歡迎一塊兒討論~
mlp能夠學到全部特徵低階和高階的信息表達,但依賴龐大的搜索空間。在樣本有限,參數也有限的狀況下每每只能學到有限的信息。所以才依賴於基於業務理解的特徵工程來幫助mlp在有限的空間下學到更多有效的特徵交互信息。FM的向量內積只是二階特徵工程的一種方法。以後針對deep的不少改進也是在探索如何把特徵工程的業務經驗用於更好的提取特徵交互信息github
def build_features(numeric_handle): f_sparse = [] f_dense = [] for col, config in EMB_CONFIGS.items(): ind = tf.feature_column.categorical_column_with_hash_bucket(col, hash_bucket_size = config['hash_size']) one_hot = tf.feature_column.indicator_column(ind) f_sparse.append(one_hot) # Method1 for numeric feature if numeric_handle == 'bucketize': # Method1 'onehot': bucket to one hot for col, config in BUCKET_CONFIGS.items(): num = tf.feature_column.numeric_column( col ) bucket = tf.feature_column.bucketized_column( num, boundaries=config ) f_sparse.append(bucket) else : # Method2 'dense': concatenate with embedding for col, config in BUCKET_CONFIGS.items(): num = tf.feature_column.numeric_column( col ) f_dense.append(num) return f_sparse, f_dense @tf_estimator_model def model_fn(features, labels, mode, params): sparse_columns, dense_columns = build_features(params['numeric_handle']) with tf.variable_scope('EmbeddingInput'): embedding_input = [] for f_sparse in sparse_columns: sparse_input = tf.feature_column.input_layer(features, f_sparse) input_dim = sparse_input.get_shape().as_list()[-1] init = tf.random_normal(shape = [input_dim, params['embedding_dim']]) weight = tf.get_variable('w_{}'.format(f_sparse.name), dtype = tf.float32, initializer = init) embedding_input.append( tf.matmul(sparse_input, weight) ) dense = tf.concat(embedding_input, axis=1, name = 'embedding_concat') # if treat numeric feature as dense feature, then concatenate with embedding. else concatenate wtih sparse input if params['numeric_handle'] == 'dense': numeric_input = tf.feature_column.input_layer(features, dense_columns) numeric_input = tf.layers.batch_normalization(numeric_input, center = True, scale = True, trainable =True, training = (mode == tf.estimator.ModeKeys.TRAIN)) dense = tf.concat([dense, numeric_input], axis = 1, name ='numeric_concat') with tf.variable_scope('MLP'): for i, unit in enumerate(params['hidden_units']): dense = tf.layers.dense(dense, units = unit, activation = 'relu', name = 'Dense_{}'.format(i)) if mode == tf.estimator.ModeKeys.TRAIN: dense = tf.layers.dropout(dense, rate = params['dropout_rate'], training = (mode==tf.estimator.ModeKeys.TRAIN)) with tf.variable_scope('output'): y = tf.layers.dense(dense, units=1, name = 'output') return y
Wide&Deep是在上述MLP的基礎上加入了Wide部分。做者認爲Deep的部分負責generalization既樣本中未出現模式的泛化和模糊查詢,就是上面的Embedding+MLP。wide負責memorization既樣本中已有模式的記憶,是對離散特徵和特徵組合作Logistics Regression。Deep和Wide一塊兒進行聯合訓練。網絡
這樣說可能不徹底準確,做者在文中也提到wide部分只是用來錦上添花,來幫助Deep增長那些在樣本中頻繁出現的模式在預測目標上的區分度。因此wide不須要是一個full-size模型,而更多須要業務上判斷比較核心的特徵和交叉特徵。app
ctr模型大可能是在探討稀疏離散特徵的處理,那連續特徵應該怎麼處理呢?有幾種處理方式框架
連續特徵離散化的優缺點
缺點dom
優勢ide
def znorm(mean, std): def znorm_helper(col): return (col-mean)/std return znorm_helper def build_features(): f_onehot = [] f_embedding = [] f_numeric = [] # categorical features for col, config in EMB_CONFIGS.items(): ind = tf.feature_column.categorical_column_with_hash_bucket(col, hash_bucket_size = config['hash_size']) f_onehot.append( tf.feature_column.indicator_column(ind)) f_embedding.append( tf.feature_column.embedding_column(ind, dimension = config['emb_size']) ) # numeric features: both in numeric feature and bucketized to discrete feature for col, config in BUCKET_CONFIGS.items(): num = tf.feature_column.numeric_column(col, normalizer_fn = znorm(NORM_CONFIGS[col]['mean'],NORM_CONFIGS[col]['std'] )) f_numeric.append(num) bucket = tf.feature_column.bucketized_column( num, boundaries=config ) f_onehot.append(bucket) # crossed features for col1,col2 in combinations(f_onehot,2): # if col is indicator of hashed bucuket, use raw feature directly if col1.parents[0].name in EMB_CONFIGS.keys(): col1 = col1.parents[0].name if col2.parents[0].name in EMB_CONFIGS.keys(): col2 = col2.parents[0].name crossed = tf.feature_column.crossed_column([col1, col2], hash_bucket_size = 20) f_onehot.append(tf.feature_column.indicator_column(crossed)) f_dense = f_embedding + f_numeric #f_dense = f_embedding + f_numeric + f_onehot f_sparse = f_onehot #f_sparse = f_onehot + f_numeric return f_sparse, f_dense def build_estimator(model_dir): sparse_feature, dense_feature= build_features() run_config = tf.estimator.RunConfig( save_summary_steps=50, log_step_count_steps=50, keep_checkpoint_max = 3, save_checkpoints_steps =50 ) dnn_optimizer = tf.train.ProximalAdagradOptimizer( learning_rate= 0.001, l1_regularization_strength=0.001, l2_regularization_strength=0.001 ) estimator = tf.estimator.DNNLinearCombinedClassifier( model_dir=model_dir, linear_feature_columns=sparse_feature, dnn_feature_columns=dense_feature, dnn_optimizer = dnn_optimizer, dnn_dropout = 0.1, batch_norm = False, dnn_hidden_units = [48,32,16], config=run_config ) return estimator
https://github.com/DSXiangLi/CTR學習