大體分三類: 起步體悟,實戰筆記,行家導讀python
機器學習入門者學習指南 @果殼網 (2013) 做者 白馬 -- [起步體悟] 研究生型入門者的親身經歷linux
有沒有作機器學習的哥們?可否介紹一下是如何起步的 @ourcoders-- [起步體悟] 研究生型入門者的親身經歷,尤爲要看reyoung的建議git
tornadomeet 機器學習筆記 (2013) -- [實戰筆記] 學霸的學習筆記,看看小夥伴是怎樣一步一步地掌握「機器學習」程序員
Machine Learning Roadmap: Your Self-Study Guide to Machine Learning (2014) Jason Brownlee -- [行家導讀] 雖然是英文版,但很是容易讀懂。對Beginner,Novice,Intermediate,Advanced讀者都有覆蓋。github
A Tour of Machine Learning Algorithms (2013) 這篇關於機器學習算法分類的文章也很是好web
Best Machine Learning Resources for Getting Started(2013) 這片有中文翻譯 機器學習的最佳入門學習資源面試
Tom Mitchell 和 Andrew Ng 的課都很適合入門
機器學習入門資源不徹底彙總入門課程機器學習入門資源不徹底彙總2011Tom Mitchell(CMU)機器學習
Decision Trees
Probability and Estimation
Naive Bayes
Logistic Regression
Linear Regression
Practical Issues: Feature selection,Overfitting ...
Graphical models: Bayes networks, EM,Mixture of - - Gaussians clustering ...
Computational Learning Theory: PAC Learning, Mistake bounds ...
Semi-Supervised Learning
Hidden Markov Models
Neural Networks
Learning Representations: PCA, Deep belief networks, ICA, CCA ...
Kernel Methods and SVM
Active Learning
Reinforcement Learning 以上爲課程標題節選
Dan Levin, What is the differencebetween statistics, machine learning, AI and data mining?
If there are up to 3 variables, it is statistics.
If the problem is NP-complete, it is machine learning.
If the problem is PSPACE-complete, it is AI.
If you don't know what is PSPACE-complete, it is data mining.
幾篇高屋建瓴的機器學習領域概論
The Discipline of Machine LearningTom Mitchell 當年爲在CMU創建機器學習系給校長寫的東西。
A Few Useful Things to Know about Machine Learning Pedro Domingos教授的大道理,也許入門時不少概念還不明白,上完公開課後必定要再讀一遍。
Calculus: Single Variable | Calculus One (可選)
Multivariable Calculus
Linear Algebra
Introduction to Statistics: Descriptive Statistics
Probabilistic Systems Analysis and Applied Probability | 機率 ( 可選)
Introduction to Statistics: Inference
Programming for Everybody (Python)
DataCamp: Learn R with R tutorials and coding challenges(R)
Introduction to Computer Science:Build a Search Engine & a Social Network
Statistical Learning(R)
Machine Learning
2.《 Mathematician's Lament | 數學家的嘆息》做者:by Paul Lockhart
3.《 Think Stats: Probability and Statistics forProgrammers | 統計思惟:程序員數學之機率統計 》 做者:Allen B. Downey
4.《 A History of Mathematics | 數學史 》做者:Carl B. Boyer
5.《 Journeys Through Genius | 天才引導的歷程:數學中的偉大定理 》做者:William Dunham
6.《 The Mathematical Experience | 數學經驗 》做者 Philip J.Davis、Reuben Hersh
7.《 Proofs from the Book | 數學天書中的證實 》做者:Martin Aigner、Günter M. Ziegler
8.《 Proofs and Refutations | 證實與反駁-數學發現的邏輯 》做者:Imre Lakatos
1.Python/C++/R/Java - you will probably want to learnall of these languages at some point if you want a job in machine-learning.Python's Numpy and Scipy libraries [2] are awesome because they have similarfunctionality to MATLAB, but can be easily integrated into a web service andalso used in Hadoop (see below). C++ will be needed to speed code up. R [3] isgreat for statistics and plots, and Hadoop [4] is written in Java, so you mayneed to implement mappers and reducers in Java (although you could use ascripting language via Hadoop streaming [5])
首先,你要熟悉這四種語言。Python由於開源的庫比較多,能夠看看Numpy和Scipy這兩個庫,這兩個均可以很好的融入網站開發以及Hadoop。C++可讓你的代碼跑的更快,R則是一個很好地統計工具。而你想很好地使用Hadoop你也必須懂得java,以及如何實現map reduce
2.Probability and Statistics: A good portion oflearning algorithms are based on this theory. Naive Bayes [6], Gaussian MixtureModels [7], Hidden Markov Models [8], to name a few. You need to have a firmunderstanding of Probability and Stats to understand these models. Go nuts andstudy measure theory [9]. Use statistics as an model evaluation metric:confusion matrices, receiver-operator curves, p-values, etc.
李航寫的,這算的上我mentor的mentor了。理解一些機率的理論,好比貝葉斯,SVM,CRF,HMM,決策樹,AdaBoost,邏輯斯蒂迴歸,而後再稍微看看怎麼作evaluation 好比P R F。也能夠再看看假設檢驗的一些東西。
3.Applied Math + Algorithms: For discriminatemodels like SVMs [10], you need to have a firm understanding of algorithmtheory. Even though you will probably never need to implement an SVM fromscratch, it helps to understand how the algorithm works. You will need tounderstand subjects like convex optimization [11], gradient decent [12],quadratic programming [13], lagrange [14], partial differential equations [15],etc. Get used to looking at summations [16].
4.Distributed Computing: Most machine learningjobs require working with large data sets these days (see Data Science) [17].You cannot process this data on a single machine, you will have to distributeit across an entire cluster. Projects like Apache Hadoop [4] and cloud serviceslike Amazon's EC2 [18] makes this very easy and cost-effective. Although Hadoopabstracts away a lot of the hard-core, distributed computing problems, youstill need to have a firm understanding of map-reduce [22], distribute-filesystems [19], etc. You will most likely want to check out Apache Mahout [20]and Apache Whirr [21].
5.Expertise in Unix Tools: Unless you are veryfortunate, you are going to need to modify the format of your data sets so theycan be loaded into R,Hadoop,HBase [23],etc. You can use a scripting languagelike python (using re) to do this but the best approach is probably just masterall of the awesome unix tools that were designed for this: cat [24], grep [25],find [26], awk [27], sed [28], sort [29], cut [30], tr [31], and many more.Since all of the processing will most likely be on linux-based machine (Hadoopdoesnt run on Window I believe), you will have access to these tools. Youshould learn to love them and use them as much as possible. They certainly havemade my life a lot easier. A great example can be found here [1].
6.Become familiar with the Hadoop sub-projects:HBase, Zookeeper [32], Hive [33], Mahout, etc. These projects can help youstore/access your data, and they scale.
機器學習終究和大數據息息相關,因此Hadoop的子項目要關注,好比HBase Zookeeper Hive等等
7.Learn about advanced signal processing techniques:feature extraction is one of the most important parts of machine-learning. Ifyour features suck, no matter which algorithm you choose, your going to seehorrible performance. Depending on the type of problem you are trying to solve,you may be able to utilize really cool advance signal processing algorithmslike: wavelets [42], shearlets [43], curvelets [44], contourlets [45], bandlets[46]. Learn about time-frequency analysis [47], and try to apply it to yourproblems. If you have not read about Fourier Analysis[48] and Convolution[49],you will need to learn about this stuff too. The ladder is signal processing101 stuff though.
Finally, practice and read as much as you can. In yourfree time, read papers like Google Map-Reduce [34], Google File System [35],Google Big Table [36], The Unreasonable Effectiveness of Data [37],etc Thereare great free machine learning books online and you should read those also.[38][39][40]. Here is an awesome course I found and re-posted on github [41].Instead of using open source packages, code up your own, and compare theresults. If you can code an SVM from scratch, you will understand the conceptof support vectors, gamma, cost, hyperplanes, etc. It's easy to just load somedata up and start training, the hard part is making sense of it all.
總之機器學習若是想要入門分爲兩方面: 一方面是去看算法,須要極強的數理基礎(真的是極強的),從SVM入手,一點點理解。