原文地址:http://blog.sina.com.cn/s/blog_7e5f32ff0102vlgj.htmlhtml
1.《數學之美》PDF6
做者吳軍你們都很熟悉。以極爲通俗的語言講述了數學在機器學習和天然語言處理等領域的應用。python
2.《Programming Collective Intelligence》(《集體智慧編程》)PDF3
做者Toby Segaran也是《BeautifulData : The Stories Behind Elegant Data Solutions》(《數據之美:解密優雅數據解決方案背後的故事》)的做者。這本書最大的優點就是裏面沒有理論推導和複雜的數學公式,是很不錯的入門書。目前中文版已經脫銷,對於有志於這個領域的人來講,英文的pdf是個不錯的選擇,由於後面有不少經典書的翻譯都較差,只能看英文版,不如從這個入手。還有,這本書適合於快速看完,由於據評論,看完一些經典的帶有數學推導的書後會發現這本書什麼都沒講,只是舉了不少例子而已。web
3.《Algorithms of the Intelligent Web》(《智能web算法》)PDF1
做者Haralambos Marmanis、Dmitry Babenko。這本書中的公式比《集體智慧編程》要略多一點,裏面的例子可能是互聯網上的應用,看名字就知道。不足的地方在於裏面的配套代碼是BeanShell而不是python或其餘。總起來講,這本書仍是適合初學者,與上一本同樣須要快速讀完,若是讀完上一本的話,這一本能夠沒必要細看代碼,瞭解算法主要思想就好了。算法
4.《統計學習方法》 PDF模糊
做者李航,是國內機器學習領域的幾個你們之一,曾在MSRA任高級研究員,如今華爲諾亞方舟實驗室。書中寫了十個算法,每一個算法的介紹都很乾脆,直接上公式,是徹頭徹尾的「乾貨書」。每章末尾的參考文獻也方便了想深刻理解算法的童鞋直接查到經典論文;本書能夠與上面兩本書互爲輔助閱讀。編程
5.《Machine Learning》(《機器學習》) PDF7
做者Tom Mitchell是CMU的大師,有機器學習和半監督學習的網絡課程視頻。這本書是領域內翻譯的較好的書籍,講述的算法也比《統計學習方法》的範圍要大不少。據評論這本書主要在於啓發,講述公式爲何成立而不是推導;不足的地方在於出版年限較早,時效性不如PRML。但有些基礎的經典仍是不會過期的,因此這本書如今幾乎是機器學習的必讀書目。api
6.《Mining of Massive Datasets》(《大數據》) PDF19
做者Anand Rajaraman[3]、Jeffrey David Ullman,Anand是Stanford的PhD。這本書介紹了不少算法,也介紹了這些算法在數據規模比較大的時候的變形。可是限於篇幅,每種算法都沒有展開講的感受,若是想深刻了解須要查其餘的資料,不過這樣的話對算法進行了解也足夠了。還有一點不足的地方就是本書原文和翻譯都有許多錯誤,勘誤表比較長,讀者要用心了。網絡
7.《Data Mining: Practical Machine Learning Tools and Techniques》(《數據挖掘:實用機器學習技術》) PDF16
做者Ian H. Witten 、Eibe Frank是weka的做者、新西蘭懷卡託大學教授。他們的《ManagingGigabytes》[4]也是信息檢索方面的經典書籍。這本書最大的特色是對weka的使用進行了介紹,可是其理論部分太單薄,做爲入門書籍還可,可是,經典的入門書籍如《集體智慧編程》、《智能web算法》已經很經典,學習的話不宜讀太多的入門書籍,建議只看一些上述兩本書沒講到的算法。app
8.《機器學習及其應用》
周志華、楊強主編。來源於「機器學習及其應用研討會」的文集。該研討會由復旦大學智能信息處理實驗室發起,目前已舉辦了十屆,國內的大牛如李航、項亮、王海峯、劉鐵巖、餘凱等都曾在該會議上作過講座。這本書講了不少機器學習前沿的具體的應用,須要有基礎的才能看懂。若是想了解機器學習研究趨勢的能夠瀏覽一下這本書。關注領域內的學術會議是發現研究趨勢的方法嘛。dom
9.《Managing Gigabytes》(深刻搜索引擎)PDF8
信息檢索不錯的書。機器學習
10.《Modern Information Retrieval》 PDF6
Ricardo Baeza-Yates et al. 1999。貌似第一本完整講述IR的書。惋惜IR這些年進展迅猛,這本書略有些過期了。翻翻作參考仍是不錯的。另外,Ricardo同窗如今是Yahoo Research for Europe and Latin Ameria的頭頭。
11.《推薦系統實踐》 PDF13
項亮,不錯的入門讀物
1.《Pattern Classification》(《模式分類》第二版) PDF14
做者Richard O. Duda[5]、Peter E. Hart、David。模式識別的奠定之做,但對最近呈主導地位的較好的方法SVM、Boosting方法沒有介紹,被評「掛一漏萬之嫌」。
2.《Pattern Recognition And Machine Learning》 PDF74
做者Christopher M. Bishop[6];簡稱PRML,側重於機率模型,是貝葉斯方法的扛鼎之做,據評「具備強烈的工程氣息,能夠配合stanford 大學 Andrew Ng 教授的 Machine Learning 視頻教程一塊兒來學,效果翻倍。」
3.《The Elements of Statistical Learning : Data Mining, Inference, andPrediction》,(《統計學習基礎:數據挖掘、推理與預測》第二版) PDF8
做者RobertTibshirani、Trevor Hastie、Jerome Friedman。「這本書的做者是Boosting方法最活躍的幾個研究人員,發明的Gradient Boosting提出了理解Boosting方法的新角度,極大擴展了Boosting方法的應用範圍。這本書對當前最爲流行的方法有比較全面深刻的介紹,對工程人員參考價值也許要更大一點。另外一方面,它不只總結了已經成熟了的一些技術,並且對尚在發展中的一些議題也有簡明扼要的論述。讓讀者充分體會到機器學習是一個仍然很是活躍的研究領域,應該會讓學術研究人員也有常讀常新的感覺。」[7]
4.《Data Mining:Concepts andTechniques》(《數據挖掘:概念與技術》第三版) PDF3
做者(美)Jiawei Han[8]、(加)Micheline Kamber、(加)Jian Pei,其中第一做者是華裔。本書毫無疑問是數據挖掘方面的的經典之做,不過翻譯版老是被噴,沒辦法,大部分翻譯過來的書籍都被噴,想要不吃別人嚼過的東西,就好好學習英文吧。
5.《AI, Modern Approach 2nd》 PDF8
Peter Norvig,無爭議的領域經典。
6.《Foundations of Statistical Natural Language Processing》 PDF7
天然語言處理領域公認經典。
7.《Information Theory:Inference and Learning Algorithms》 PDF5
8.《Statistical Learning Theory》 PDF7
Vapnik的大做,統計學界的權威,本書將理論上升到了哲學層面,他的另外一本書《The Nature ofStatistical Learning Theory》也是統計學習研究不可多得的好書,可是這兩本書都比較深刻,適合有必定基礎的讀者。
1.《矩陣分析》 PDF22
Roger Horn。矩陣分析領域無爭議的經典
2.《機率論及其應用》 PDF3
威廉·費勒。極牛的書,可數學味道過重,不適合作機器學習的
3.《All Of Statistics》 PDF高清版18
機器學習這個方向,統計學也同樣很是重要。推薦All of statistics,這是CMU的一本很簡潔的教科書,注重概念,簡化計算,簡化與Machine Learning無關的概念和統計內容,能夠說是很好的快速入門材料。
4.《Nonlinear Programming, 2nd》 PDF5
最優化方法,非線性規劃的參考書。
5.《Convex Optimization》 PDF9 配套代碼7
Boyd的經典書籍,被引用次數超過14000次,面向實際應用,而且有配套代碼,是一本不可多得的好書。
6.《Numerical Optimization》 PDF6
第二版,Nocedal著,很是適合非數值專業的學生和工程師參考,算法流程清晰詳細,原理清楚。
7.《Introduction to Mathematical Statistics》 PDF5
第六版,Hogg著,本書介紹了機率統計的基本概念以及各類分佈,以及ML,Bayesian方法等內容。
8.《An Introduction to Probabilistic Graphical Models》 PDF20
Jordan著,本書介紹了條件獨立、分解、混合、條件混合等圖模型中的基本概念,對隱變量(潛在變量)也作了詳細介紹,相信你們在隱馬爾科夫鏈和用Gaussian混合模型來實現EM算法時遇到過這個概念。
9.《Probabilistic Graphical Models-Principles and Techniques》 PDF8
Koller著,一本很厚很全面的書,理論性很強,能夠做爲參考書使用。
具體數學 PDF5
經典
1.線性代數 (Linear Algebra):
我想國內的大學生都會學過這門課程,可是,未必每一位老師都能貫徹它的精要。這門學科對於Learning是必備的基礎,對它的透徹掌握是必不可少的。我在科大一年級的時候就學習了這門課,後來到了香港後,又從新把線性代數讀了一遍,所讀的是
Introduction to Linear Algebra (3rd Ed.) by Gilbert Strang.
這本書是MIT的線性代數課使用的教材,也是被不少其它大學選用的經典教材。它的難度適中,講解清晰,重要的是對許多核心的概念討論得比較透徹。我我的以爲,學習線性代數,最重要的不是去熟練矩陣運算和解方程的方法——這些在實際工做中MATLAB能夠代勞,關鍵的是要深刻理解幾個基礎而又重要的概念:子空間(Subspace),正交(Orthogonality),特徵值和特徵向量(Eigenvalues and eigenvectors),和線性變換(Linear transform)。從個人角度看來,一本線代教科書的質量,就在於它可否給這些根本概念以足夠的重視,可否把它們的聯繫講清楚。Strang的這本書在這方面是作得很好的。
並且,這本書有個得天獨厚的優點。書的做者長期在MIT講授線性代數課(18.06),課程的video在MIT的Open courseware網站上有提供。有時間的朋友能夠一邊看着名師授課的錄像,一邊對照課本學習或者複習。
http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/CourseHome/index.htm8
2.機率和統計 (Probability and Statistics):
機率論和統計的入門教科書不少,我目前也沒有特別的推薦。我在這裏想介紹的是一本關於多元統計的基礎教科書:
Applied Multivariate Statistical Analysis (5th Ed.) by Richard A. Johnson and Dean W. Wichern
這本書是我在剛接觸向量統計的時候用於學習的,我在香港時作研究的基礎就是今後打下了。實驗室的一些同窗也借用這本書學習向量統計。這本書沒有特別追求數學上的深度,而是以通俗易懂的方式講述主要的基本概念,讀起來很舒服,內容也很實用。對於Linear regression, factor analysis, principal component analysis (PCA), and canonical component analysis (CCA)這些Learning中的基本方法也展開了初步的論述。
以後就能夠進一步深刻學習貝葉斯統計和Graphical models。一本理想的書是
Introduction to Graphical Models (draft version). by M. Jordan and C. Bishop.
我不知道這本書是否是已經出版了(不要和Learning in Graphical Models混淆,那是個論文集,不適合初學)。這本書從基本的貝葉斯統計模型出發一直深刻到複雜的統計網絡的估計和推斷,深刻淺出,statistical learning的許多重要方面都在此書有清楚論述和詳細講解。MIT內部能夠access,至於外面,好像也是有電子版的。
3.分析 (Analysis):
我想你們基本都在大學就學過微積分或者數學分析,深度和廣度則隨各個學校而異了。這個領域是不少學科的基礎,值得推薦的教科書莫過於
Principles of Mathematical Analysis, by Walter Rudin
有點老,可是絕對經典,深刻透徹。缺點就是比較艱深——這是Rudin的書的一向風格,適合於有必定基礎後回頭去看。
在分析這個方向,接下來就是泛函分析(Functional Analysis)。
Introductory Functional Analysis with Applications, by Erwin Kreyszig.
適合做爲泛函的基礎教材,容易切入而不失全面。我特別喜歡它對於譜論和算子理論的特別關注,這對於作learning的研究是特別重要的。Rudin也有一本關於functional analysis的書,那本書在數學上可能更爲深入,可是不易於上手,所講內容和learning的切合度不如此書。
在分析這個方向,還有一個重要的學科是測度理論(Measure theory),可是我看過的書裏面目前尚未感受有特別值得介紹的。
4.拓撲 (Topology):
在我讀過的基本拓撲書各有特點,可是綜合而言,我最推崇:
Topology (2nd Ed.) by James Munkres
這本書是Munkres教授長期執教MIT拓撲課的心血所凝。對於通常拓撲學(General topology)有全面介紹,而對於代數拓撲(Algebraic topology)也有適度的探討。此書不須要特別的數學知識就能夠開始學習,由淺入深,從最基本的集合論概念(不少書不屑講這個)到Nagata-Smirnov Theorem和Tychonoff theorem等較深的定理(不少書避開了這個)都覆蓋了。講述方式思想性很強,對於不少定理,除了給出證實過程和引導你思考其背後的原理脈絡,不少使人讚歎的亮點——我常讀得忘卻飢餓,不肯釋手。不少習題頗有水平。
5.流形理論 (Manifold theory):
對於拓撲和分析有必定把握時,方可開始學習流形理論,不然所學只能流於浮淺。我所使用的書是
Introduction to Smooth Manifolds. by John M. Lee
雖然書名有introduction這個單詞,可是實際上此書涉入很深,除了講授了基本的manifold, tangent space, bundle, sub-manifold等,還探討了諸如綱理論(Category theory),德拉姆上同調(De Rham cohomology)和積分流形等一些比較高級的專題。對於李羣和李代數也有至關多的討論。行文通俗而又不失嚴謹,不過對某些記號方式須要熟悉一下。
雖然李羣論是建基於平滑流形的概念之上,不過,也可能從矩陣出發直接學習李羣和李代數——這種方法對於急需使用李羣論解決問題的朋友可能更加實用。並且,對於一個問題從不一樣角度看待也利於加深理解。下面一本書就是這個方向的典範:
Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. by Brian C. Hall
此書從開始即從矩陣切入,從代數而非幾何角度引入矩陣李羣的概念。並經過定義運算的方式創建exponential mapping,並就此引入李代數。這種方式比起傳統的經過「左不變向量場(Left-invariant vector field)「的方式定義李代數更容易爲人所接受,也更容易揭示李代數的意義。最後,也有專門的論述把這種新的定義方式和傳統方式聯繫起來。
轉自水木
除了如下推薦的書之外,出版在Foundations and Trends in Machine Learning上面的survey文章都值得一看。
入門:
Pattern Recognition And Machine Learning
Christopher M. Bishop
Machine Learning : A Probabilistic Perspective
Kevin P. Murphy
The Elements of Statistical Learning : Data Mining, Inference, and Predictio
n
Trevor Hastie, Robert Tibshirani, Jerome Friedman
Information Theory, Inference and Learning Algorithms
David J. C. MacKay
All of Statistics : A Concise Course in Statistical Inference
Larry Wasserman
優化:
Convex Optimization
Stephen Boyd, Lieven Vandenberghe
Numerical Optimization
Jorge Nocedal, Stephen Wright
Optimization for Machine Learning
Suvrit Sra, Sebastian Nowozin, Stephen J. Wright
核方法:
Kernel Methods for Pattern Analysis
John Shawe-Taylor, Nello Cristianini
Learning with Kernels : Support Vector Machines, Regularization, Optimizatio
n, and Beyond
Bernhard Schlkopf, Alexander J. Smola
半監督:
Semi-Supervised Learning
Olivier Chapelle
高斯過程:
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Le
arning)
Carl Edward Rasmussen, Christopher K. I. Williams
機率圖模型:
Graphical Models, Exponential Families, and Variational Inference
Martin J Wainwright, Michael I Jordan
Boosting:
Boosting : Foundations and Algorithms
Schapire, Robert E.; Freund, Yoav
貝葉斯:
Statistical Decision Theory and Bayesian Analysis
James O. Berger
The Bayesian Choice : From Decision-Theoretic Foundations to Computational I
mplementation
Christian P. Robert
Bayesian Nonparametrics
Nils Lid Hjort, Chris Holmes, Peter Müller, Stephen G. Walker
Principles of Uncertainty
Joseph B. Kadane
Decision Theory : Principles and Approaches
Giovanni Parmigiani, Lurdes Inoue
蒙特卡洛:
Monte Carlo Strategies in Scientific Computing
Jun S. Liu
Monte Carlo Statistical Methods
Christian P.Robert, George Casella
信息幾何:
Methods of Information Geometry
Shun-Ichi Amari, Hiroshi Nagaoka
Algebraic Geometry and Statistical Learning Theory
Watanabe, Sumio
Differential Geometry and Statistics
M.K. Murray, J.W. Rice
漸進收斂:
Asymptotic Statistics
A. W. van der Vaart
Empirical Processes in M-estimation
Geer, Sara A. van de
不推薦:
Statistical Learning Theory
Vladimir N. Vapnik
Bayesian Data Analysis, Second Edition
Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin
Probabilistic Graphical Models : Principles and Techniques
Daphne Koller, Nir Friedman
Active Learning
Two Faces of Active Learning50, Dasgupta, 2011
Active Learning Literature Survey8, Settles, 2010
Applications
A Survey of Emerging Approaches to Spam Filtering9, Caruana, 2012
Ambient Intelligence: A Survey3, Sadri, 2011
A Survey of Online Failure Prediction Methods2, Salfner, 2010
Anomaly Detection: A Survey3, Chandola, 2009
Mining Data Streams: A Review4, Gaber, 2005
Workflow Mining: A Survey of Issues and Approaches2, Aalst, 2003
Biology
Support Vector Machines in Bioinformatics: a Survey12, Chicco, 2012
Computational Epigenetics: The New Scientific Paradigm 3, Lim, 2010
Automated Protein Structure Classification: A Survey4, Hassanzadeh, 2009
Chemoinformatics - An Introduction for Computer Scientists3, Brown, 2009
Computational Challenges in Systems Biology2, Heath, 2009
Computational Epigenetics 3, Bock, 2008
Progress and Challenges in Protein Structure Prediction3, Zhang, 2008
A Review of Feature Selection in Bioinformatics4, Saeys, 2007
Machine Learning in Bioinformatics: A Brief Survey and Recommendations for Practitioners6, Bhaskar, 2006
Bioinformatics - An Introduction for Computer Scientists1, Cohen, 2004
Computational Systems Biology2, Kitano, 2002
Protein Structure Prediction and Structural Genomics2, Baker, 2001
Recent Developments and Future Directions in Computational Genomics1, Tsoka, 2000
Molecular Biology for Computer Scientists1, Hunter, 1993
Classification
Supervised Machine Learning: A Review of Classification Techniques22, Kotsiantis, 2007
Clustering
XML Data Clustering: An Overview4, Algergawy, 2011
Data Clustering: 50 Years Beyond K-Means6, Jain, 2010
Clustering Stability: An Overview5, Luxburg, 2010
Parallel Clustering Algorithms: A Survey4, Kim, 2009
A Survey: Clustering Ensembles Techniques2, Ghaemi, 2009
A Tutorial on Spectral Clustering4, Luxburg, 2007
Survey of Clustering Data Mining Techniques4, Berkhin, 2006
Survey of Clustering Algorithms4, Xu, 2005
Clustering of Time Series Data - A Survey3, Liao, 2005
Clustering Methods4, Rokach, 2005
Recent Advances in Clustering: A Brief Survey2, Kotsiantis, 2004
Subspace Clustering for High Dimensional Data: A Review2, Parsons, 2004
Unsupervised and Semi-supervised Clustering: a Brief Survey3, Grira, 2004
Clustering in Life Sciences3, Zhao, 2002
On Clustering Validation Techniques2, Halkidi, 2001
Data Clustering: A Review3, Jain, 1999
A Survey of Fuzzy Clustering4, Yang, 1993
Computer Vision
Pedestrian Detection: An Evaluation of the State of the Art7, Dollar, 2012
A Comparative Study of Palmprint Recognition Algorithms3, Zhang, 2012
Human Activity Analysis: A Review2, Aggarwal, 2011
Subspace Methods for Face Recognition2, Rao, 2010
Context Based Object Categorization: A Critical Survey2, Galleguillos, 2010
Object tracking: A Survey3, Yilmaz, 2006
Detecting Faces in Images: A Survey2, Yang, 2002
Databases
Data Fusion3, Bleiholder, 2008
Duplicate Record Detection: A Survey2, Elmagarmid, 2007
Overview of Record Linkage and Current Research Directions2, Winkler, 2006
A Survey of Schema-based Matching Approaches3, Shvaiko, 2005
Deep Learning
Representation Learning: A Review and New Perspectives17, Bengio, 2012
Dimension Reduction
Dimensionality Reduction: A Comparative Review6, Maaten, 2009
Dimension Reduction: A Guided Tour4, Burges, 2009
A Survey of Manifold-Based Learning Methods2, Huo, 2007
Toward Integrating Feature Selection Algorithms for Classification and Clustering3, Liu, 2005
An Introduction to Variable and Feature Selection3, Guyon, 2003
A Survey of Dimension Reduction Techniques2, Fodor, 2002
Economics
Auctions and Bidding: A Guide for Computer Scientists1, Parsons, 2011
Computational Sustainability1, Gomes, 2009
Computational Finance1, Tsang, 2004
Game Theory
Computer Poker: A Review4, Rubin, 2011
Graphical Models
An Introduction to Variational Methods for Graphical Models5, Jordan, 1999
Kernel Methods
Kernels for Vector-Valued Functions: a Review4, Alvarez, 2012
Learning Theory
Introduction to Statistical Learning Theory7, Bousquet, 2004
Machine Learning
A Few Useful Things to Know about Machine Learning7, Domingos, 2012
A Tutorial on Bayesian Nonparametric Models4, Blei, 2011
Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning2, Criminisi, 2011
Top 10 Algorithms in Data Mining4, Wu, 2008
Semi-Supervised Learning Literature Survey, Zhu, 2007
Interestingness Measures for Data Mining: A Survey, Geng, 2006
A Survey of Interestingness Measures for Knowledge Discovery1, McGarry, 2005
A Tutorial on the Cross-Entropy Method, Boer, 2005
A Survey of Kernels for Structured Data, Gartner, 2003
Survey on Frequent Pattern Mining, Goethals, 2003
The Boosting Approach to Machine Learning: An Overview1, Schapire, 2003
A Survey on Wavelet Applications in Data Mining, Li, 2002
Mathematics
Topology and Data3, Carlsson, 2009
Multi-armed Bandit
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems3, Bubeck, 2012
Natural Computing
Reservoir Computing Approaches to Recurrent Neural Network Training, Jaeger, 2009
Artificial Immune Systems, Aickelin, 2005
A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery, Freitas?? , 2003
Data Mining in Soft Computing Framework: A Survey, Mitra, 2002
Neural Networks for Classification: A Survey1, Zhang, 2000
Natural Language Processing
Probabilistic Topic Models2, Blei, 2012
Ontology Learning From Text: A Look Back And Into The Future1, Wong, 2012
Machine Transliteration Survey, Karimi, 2011
Translation Techniques in Cross-Language Information Retrieval, Zhou, 2011
Comprehensive Review of Opinion Summarization, Kim, 2011
A Survey on Sentiment Detection of Reviews, Tang, 2009
Word Sense Desambiguation: A Survey, Navigli, 2009
Topic Models, Blei, 2009
Opinion Mining and Sentiment Analysis, Pang, 2008
Information Extraction, Sarawagi, 2008
Statistical Machine Translation, Lopez, 2008
A Survey of Named Entity Recognition and Classification, Nadeau, 2007
Adaptive Information Extraction, Turmo, 2006
Survey of Text Clustering, Jing, 2005
Machine Learning in Automated Text Categorization, Sebastiani, 2002
Web Mining Research: A Survey, Kosala, 2000
Networks
Community Detection in Graphs1, Fortunato, 2010
A Survey of Statistical Network Models, Goldenberg, 2010
Communities in Networks, Porter, 2009
Graph Clustering, Schaeffer, 2007
Graph Mining: Laws, Generators, and Algorithms, Chakrabarti, 2006
Comparing Community Structure Identification, Danon, 2005
Link Mining: A Survey1, Getoor, 2005
Detecting Community Structure in Networks, Newman, 2004
Link Mining: A New Data Mining Challenge, Getoor, 2003
On-Line Learning
On-Line Algorithms in Machine Learning1, Blum, 1998
Others
A Survey of Very Large-Scale Neighborhood Search Techniques, Ahuja, 2001
Planning and Scheduling
A Review of Machine Learning for Automated Planning1, Jimenez, 2009
Probabilistic
Approximate Policy Iteration: A Survey and Some New Methods, Bertsekas, 2011
An Introduction to MCMC for Machine Learning1, Andrieu, 2003
Probabilistic Models
An Introduction to Conditional Random Fields1, Sutton, 2010
Randomized Algorithms
Randomized Algorithms for Matrices and Data1, Mahoney, 2011
Recommender Systems
Recent advances in Personalized Recommender Systems1, Liu, 2009
Matrix Factorization Techniques for Recommender Systems1, Koren, 2009
A Survey of Collaborative Filtering Techniques1, Su, 2009
Regression
Ensemble Approaches for Regression: a Survey4, Moreira, 2012
Reinforcement Learning
A Survey of Reinforcement Learning in Relational Domains1, Otterlo, 2005
Reinforcement Learning: A Survey, Kaelbling, 1996
Rule Learning
Association Mining, Ceglar, 2006
Algorithms for Association Rule Mining - A General Survey and Comparison, Hipp, 2000
Testing
Controlled Experiments on the Web: Survey and Practical Guide, Kohavi, 2009
Time Series
Time-Series Data Mining2, Esling, 2012
A Review on Time Series Data Mining1, Fu, 2011
Discrete Wavelet Transform-Based Time Series Analysis and Mining, Chaovalit, 2011
Transfer Learning
A Survey on Transfer Learning, Pan, 2010
Web Mining
A Taxonomy of Sequential Pattern Mining Algorithms, Mabroukeh, 2010
A Survey of Web Clustering Engines, Carpineto, 2009
Web Page Classification: Features and Algorithms, Qi, 2009
Mining Interesting Knowledge from Weblogs: A Survey, Facca, 2005
An Overview of Web Data Clustering Practices, Vakali, 2005
A Survey of Web Metrics, Dhyani, 2002
Data Mining for Hypertext: A Tutorial Survey3, Chakrabarti, 2000