機器學習算法詳解。

時間 2019-12-17

標籤機器學習算法詳解简体版

原文原文鏈接

轉載自：http://www.douban.com/note/262946592/?type=like

生存曲線：http://www.bioinfo-scrounger.com/archives/647（重點看）

機器學習是計算機科學和統計學的邊緣交叉領域，R關於機器學習的包主要包括如下幾個方面：

1）神經網絡（Neural Networks）：
nnet包執行單隱層前饋神經網絡，nnet是VR包的一部分（http://cran.rproject.org/web/packages/VR/index.html）。

2）遞歸拆分（Recursive Partitioning）：
遞歸拆分利用樹形結構模型，來作迴歸、分類和生存分析，主要在rpart包（http://cran.r-project.org/web/packages/rpart/index.html）和tree包（http://cran.r-project.org/web/packages/tree/index.html）裏執行，尤爲推薦rpart包。Weka裏也有這樣的遞歸拆分法，如：J4.8, C4.5, M5，包Rweka提供了R與Weka的函數的接口（http://cran.r-project.org/web/packages/RWeka/index.html）。

party包提供兩類遞歸拆分算法，能作到無偏的變量選擇和中止標準：函數ctree()用非參條件推斷法檢測自變量和因變量的關係；而函數mob()能用來創建參數模型（http://cran.r-project.org/web/packages/party/index.html）。另外，party包裏也提供二分支樹和節點分佈的可視化展現。
mvpart包是rpart的改進包，處理多元因變量的問題（http://cran.r-project.org/web/packages/mvpart/index.html）。rpart.permutation包用置換法（permutation）評估樹的有效性（http://cran.r-project.org/web/packages/rpart.permutation/index.html）。knnTree包創建一個分類樹，每一個葉子節點是一個knn分類器（http://cran.r-project.org/web/packages/knnTree/index.html）。LogicReg包作邏輯迴歸分析，針對大多數自變量是二元變量的狀況（http://cran.r-project.org/web/packages/LogicReg/index.html）。maptree包（http://cran.r-project.org/web/packages/maptree/index.html）和pinktoe包（http://cran.r-project.org/web/packages/pinktoe/index.html）提供樹結構的可視化函數。

3）隨機森林（Random Forests）：
randomForest 包提供了用隨機森林作迴歸和分類的函數（http://cran.r-project.org/web/packages/randomForest/index.html）。ipred包用bagging的思想作迴歸，分類和生存分析，組合多個模型（http://cran.r-project.org/web/packages/ipred/index.html）。party包也提供了基於條件推斷樹的隨機森林法（http://cran.r-project.org/web/packages/party/index.html）。varSelRF包用隨機森林法作變量選擇（http://cran.r-project.org/web/packages/varSelRF/index.html）。

4）Regularized and Shrinkage Methods：
lasso2包（http://cran.r-project.org/web/packages/lasso2/index.html）和lars包（http://cran.r-project.org/web/packages/lars/index.html）能夠執行參數受到某些限制的迴歸模型。elasticnet包可計算全部的收縮參數（http://cran.r-project.org/web/packages/elasticnet/index.html）。glmpath包能夠獲得廣義線性模型和COX模型的L1 regularization path（http://cran.r-project.org/web/packages/glmpath/index.html）。penalized包執行lasso (L1) 和ridge (L2)懲罰迴歸模型（penalized regression models）（http://cran.r-project.org/web/packages/penalized/index.html）。pamr包執行縮小重心分類法(shrunken centroids classifier)（http://cran.r-project.org/web/packages/pamr/index.html）。earth包可作多元自適應樣條迴歸（multivariate adaptive regression splines）（http://cran.r-project.org/web/packages/earth/index.html）。

5）Boosting :
gbm包（http://cran.r-project.org/web/packages/gbm/index.html）和boost包（http://cran.r-project.org/web/packages/boost/index.html）執行多種多樣的梯度boosting算法，gbm包作基於樹的梯度降低boosting，boost包包括LogitBoost和L2Boost。GAMMoost包提供基於boosting的廣義相加模型(generalized additive models)的程序（http://cran.r-project.org/web/packages/GAMMoost/index.html）。mboost包作基於模型的boosting（http://cran.r-project.org/web/packages/mboost/index.html）。

6）支持向量機（Support Vector Machines）：
e1071包的svm()函數提供R和LIBSVM的接口（http://cran.r-project.org/web/packages/e1071/index.html）。kernlab包爲基於核函數的學習方法提供了一個靈活的框架，包括SVM、RVM……(http://cran.r-project.org/web/packages/kernlab/index.html) 。klaR 包提供了R和SVMlight的接口（http://cran.r-project.org/web/packages/klaR/index.html）。

7）貝葉斯方法（Bayesian Methods）：
BayesTree包執行Bayesian Additive Regression Trees (BART)算法（http://cran.r-project.org/web/packages/BayesTree/index.html，http://www-stat.wharton.upenn.edu/~edgeorge/Research_papers/BART 6--06.pdf）。tgp包作Bayesian半參數非線性迴歸（Bayesian nonstationary, semiparametric nonlinear regression）（http://cran.r-project.org/web/packages/tgp/index.html）。

8）基於遺傳算法的最優化（Optimization using Genetic Algorithms）：
gafit包（http://cran.r-project.org/web/packages/gafit/index.html）和rgenoud包（http://cran.r-project.org/web/packages/rgenoud/index.html）提供基於遺傳算法的最優化程序。

9）關聯規則（Association Rules）：
arules包提供了有效處理稀疏二元數據的數據結構，並且提供函數執Apriori和Eclat算法挖掘頻繁項集、最大頻繁項集、閉頻繁項集和關聯規則（http://cran.r-project.org/web/packages/arules/index.html）。

10）模型選擇和確認（Model selection and validation）：
e1071包的tune()函數在指定的範圍內選取合適的參數（http://cran.r-project.org/web/packages/e1071/index.html）。ipred包的errorest()函數用重抽樣的方法（交叉驗證，bootstrap）估計分類錯誤率（http://cran.r-project.org/web/packages/ipred/index.html）。svmpath包裏的函數可用來選取支持向量機的cost參數C（http://cran.r-project.org/web/packages/svmpath/index.html）。ROCR包提供了可視化分類器執行效果的函數，如畫ROC曲線（http://cran.r-project.org/web/packages/ROCR/index.html）。caret包供了各類創建預測模型的函數，包括參數選擇和重要性量度（http://cran.r-project.org/web/packages/caret/index.html）。caretLSF包（http://cran.r-project.org/web/packages/caretLSF/index.html）和caretNWS（http://cran.r-project.org/web/packages/caretNWS/index.html）包提供了與caret包相似的功能。

11）統計學習基礎（Elements of Statistical Learning）：
書《The Elements of Statistical Learning: Data Mining, Inference, and Prediction 》（http://www-stat.stanford.edu/~tibs/ElemStatLearn/）裏的數據集、函數、例子都被打包放在ElemStatLearn包裏（http://cran.r-project.org/web/packages/ElemStatLearn/index.html）。

12）R統計軟件的Lars算法的軟件包提供了Lasso算法。根據模型改進的須要，數據挖掘工做者能夠藉助於Lasso算法，利用AIC準則和BIC準則精煉簡化統計模型的變量集合，達到降維的目的。所以，Lasso算法是能夠應用到數據挖掘中的實用算法。glasso（graphical lasso）是lasso方法的一種擴展，採用加罰的極大似然方法估計變量間協方差矩陣的逆矩陣（這個逆矩陣在圖模型中被稱爲Concentration Matrix或者Precision Matrix），加以適當整理以後，能夠獲得變量間的稀疏化的偏相關係數矩陣，其中的零元素表示了變量間的條件獨立關係。咱們能夠利用其中的非零元素生成圖模型。html