高級特徵工程II

時間 2019-12-07

標籤高級特徵工程简体版

原文原文鏈接

如下是Coursera上的How to Win a Data Science Competition: Learn from Top Kagglers課程筆記。html

Statistics and distance based features

該部分專一於此高級特徵工程：計算由另外一個分組的一個特徵的各類統計數據和從給定點的鄰域分析獲得的特徵。python

groupby and nearest neighbor methodsgit

例子：這裏有一些CTR任務的數據

咱們能夠暗示廣告有頁面上的最低價格將吸引大部分注意力。頁面上的其餘廣告不會頗有吸引力。計算與這種含義相關的特徵很是容易。咱們能夠爲每一個廣告的每一個用戶和網頁添加最低和最高價格。在這種狀況下，具備最低價格的廣告的位置也可使用。github

代碼實現
spring

More feature
How many pages user visited
Standard deviation of prices
Most visited page
Many, many more

若是沒有特徵能夠像這樣使用groupby呢？可使用最近鄰點app

Neighbors

Explicit group is not needed
More flexible
Much harder to implement

Examplesless

Number of houses in 500m, 1000m,..
Average price per square meter in 500m, 1000m,..
Number of schools/supermarkets/parking lots in 500m, 1000m,..
Distance to colsest subway station

講師在Springleaf比賽中使用了它。ide

KNN features in springleaf

Mean encode all the variables
For every point, find 2000 nearst neighbors using Bray-Curtis metric
$$\frac{\sum{|u_i - v_i|}}{\sum{|u_i + v_i|}}$$
Calculate various features from those 2000 neighbors

Evaluate學習

Mean target of neatrest 5,10,15,500,2000, neighbors
Mean distance to 10 closest neighbors
Mean distance to 10 closest neighbors with target 1
Mean distance to 10 closest neighbors with target 0

Matrix factorizations for feature extraction

Example of feature fusion

Notes about Matrix Fatorization

Can be apply only for some columns
Can provide additional diversity
Good for ensembles
It is lossy transformation.Its' efficirncy depends on:
Particular task
Number of latent factors
- Usually 5-100

Implementtation

Serveral MF methods you can find in sklearn
SVD and PCA
Standart tools for Matrix Fatorization
TruncatedSVD
Works with sparse matrices
Non-negative Matrix Fatorization(NMF)
Ensures that all latent fators are non-negative
Good for counts-like data

NMF for tree-based methods

non-negative matrix factorization簡稱NMF，它以一種使數據更適合決策樹的方式轉換數據。
flex

能夠看出，NMF變換數據造成平行於軸的線。

因子分解

可使用與線性模型的技巧來分解矩陣。

Conclusion

Matrix Factorization is a very general approach for dimensionality reduction and feature extraction
It can be applied for transforming categorical features into real-valued
Many of tricks trick suitable for linear models can be useful for MF

Feature interactions

特徵值的全部組合

Example:banner selection

假設咱們正在構建一個預測模型，在網站上顯示的最佳廣告橫幅。

...	category_ad	category_site	...	is_clicked
...	auto_part	game_news	...	0
...	music_tickets	music_news	..	1
...	mobile_phones	auto_blog	...	0

將廣告橫幅自己的類別和橫幅將顯示的網站類別，進行組合將構成一個很是強的特徵。

...	ad_site	...	is_clicked
...	auto_part \| game_news	...	0
...	music_tickets \| music_news	..	1
...	mobile_phones \| auto_blog	...	0

構建這兩個特徵的組合特徵ad_site

從技術角度來看，有兩種方法能夠構建這種交互。

Example of interactions

方法1

方法2

類似的想法也可用於數值變量

事實上，這不限於乘法操做，還能夠是其餘的

Multiplication
Sum
Diff
Division
..

Practival Notes

We have a lot of possible interactions -N*N for N features.
a. Even more if use several types in interactions
Need ti reduce it's number
a. Dimensionality reduction
b. Feature selection

經過這種方法生成了大量的特徵，可使用特徵選擇或降維的方法減小特徵。如下用特徵選擇舉例說明

Interactions' order

We looked at 2nd order interactions.
Such approach can be generalized for higher orders.
It is hard to do generation and selection automatically.
Manual building of high-order interactions is some kind of art.

Extract features from DT

看一下決策樹。讓咱們將每一個葉子映射成二進制特徵。對象葉子的索引能夠用做新分類特徵的值。若是咱們不使用單個樹而是使用它們的總體。例如，隨機森林，那麼這種操做能夠應用於每一個條目。這是一種提取高階交互的強大方法。

How to use it

In sklearn:

tree_model.apply()

In xgboost:

booster.predict(pred_leaf=True)

Conclusion

We looked at ways to build an interaction of categorical attributes
Extended this approach to real-valued features
Learn how to extract features via decision trees

t-SNE

用於探索數據分析。能夠被視爲從數據中獲取特徵的方法。

Practical Notes

Result heavily depends on hyperparameters(perplexity)
Good practice is to use several projections with different perplexities(5-100)
Due to stochastic nature, tSNE provides different projections even for the same data\hyperparams
Train and test should be projected together
tSNE runs for a long time with a big number of features
it is common to do dimensionality reduction before projection.
Implementation of tSNE can be found in sklearn library.
But personally I perfer you use stand-alone implementation python package tsne due to its' faster speed.