如下是根據不一樣語言類型和應用領域收集的各種工具庫,持續更新中。php
C
通用機器學習
計算機視覺
- CCV - C-based/Cached/Core Computer Vision Library ,是一個現代化的計算機視覺庫。
- VLFeat - VLFeat 是開源的 computer vision algorithms庫, 有 Matlab toolbox。
C++
計算機視覺
- OpenCV - 最經常使用的視覺庫。有 C++, C, Python 以及 Java 接口),支持Windows, Linux, Android and Mac OS。
- DLib - DLib 有 C++ 和 Python 臉部識別和物體檢測接口 。
- EBLearn - Eblearn 是一個面向對象的 C++ 庫,實現了各類機器學習模型。
- VIGRA - VIGRA 是一個跨平臺的機器視覺和機器學習庫,能夠處理任意維度的數據,有Python接口。
通用機器學習
- MLPack - 可拓展的 C++ 機器學習庫。
- DLib - 設計爲方便嵌入到其餘系統中。
- encog-cpp
- shark
- Vowpal Wabbit (VW) - A fast out-of-core learning system.
- sofia-ml - fast incremental 算法套件.
- Shogun - The Shogun Machine Learning Toolbox
- Caffe - deep learning 框架,結構清晰,可讀性好,速度快。
- CXXNET - 精簡的框架,核心代碼不到 1000 行。
- XGBoost - 爲並行計算優化過的 gradient boosting library.
- CUDA - This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
- Stan - A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling
- BanditLib - A simple Multi-armed Bandit library.
- Timbl - 實現了多個基於內存的算法,其中 IB1-IG (KNN分類算法)和 IGTree(決策樹)在NLP中普遍應用.
天然語言處理
機器翻譯
語音識別
- Kaldi - Kaldi是一個C ++工具,以Apache許可證V2.0發佈。Kaldi適用於語音識別的研究。
Sequence Analysis
- ToPS - This is an objected-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet.
Java
天然語言處理
通用機器學習
- aerosolve - Airbnb 從頭開始設計的機器學習庫,易用性好。
- Datumbox - 機器學習和統計應用程序的快速開發框架。
- ELKI - 數據挖掘工具. (非監督學習: 聚類, 離羣點檢測等.)
- Encog - 先進的神經網絡和機器學習框架。 Encog中包含用於建立各類網絡,以及規範和處理數據的神經網絡。 Encog訓練採用多線程彈性的傳播方式。 Encog還能夠利用GPU的進一步加快處理時間。有基於GUI的工做臺。
- H2O - 機器學習引擎,支持Hadoop, Spark等分佈式系統和我的電腦,能夠經過R, Python, Scala, REST/JSON調用API。
- htm.java - 通用機器學習庫,使用 Numenta’s Cortical Learning Algorithm
- java-deeplearning - 分佈式深度學習平臺 for Java, Clojure,Scala
- JAVA-ML - Java通用機器學習庫,全部算法統一接口。
- JSAT - 具備不少分類,迴歸,聚類等機器學習算法。
- Mahout - 分佈式機器學習工具。
- Meka - 一個開源實現的多標籤分類和評估方法。基於weka擴展。
- MLlib in Apache Spark - Spark分佈式機器學習庫
- Neuroph - 輕量級Java神經網絡框架
- ORYX - Lambda Architecture Framework,使用Apache Spark和Apache Kafka實現實時大規模機器學習。
- RankLib - 排序算法學習庫。
- Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.
- SmileMiner - Statistical Machine Intelligence & Learning Engine
- SystemML - 靈活的,可擴展的機器學習語言。
- WalnutiQ - 面向對象的人腦模型
- Weka - WEKA是機器學習算法用於數據挖掘任務的算法集合。
語音識別
數據分析、可視化
Deep Learning
Python
計算機視覺
- Scikit-Image - Python中的圖像處理算法的集合。
- SimpleCV - 一個開源的計算機視覺框架,容許訪問幾個高性能計算機視覺庫,如OpenCV。能夠運行在Mac,Windows和Ubuntu Linux操做系統上。
- Vigranumpy - 計算機視覺庫VIGRA C++ 的Python綁定。
天然語言處理
- NLTK - 構建與人類語言數據相關工做的Python程序的領先平臺。
- Pattern - 基於Python的Web挖掘模塊。它有天然語言處理,機器學習等工具。
- Quepy - 將天然語言問題轉換成數據庫查詢語言。
- TextBlob - 爲普通的天然語言處理(NLP)任務提供一致的API。構建於NLTK和Pattern上,並很好地與二者交互。
- YAlign - 句子對齊工具,從對照語料中抽取並行句子。
- jieba - 中文分詞工具
- SnowNLP - 中文文本處理庫。
- loso - 中文分詞工具
- genius - 基於條件隨機場的中文分詞工具
- KoNLPy - 韓語天然語言處理
- nut - 天然語言理解工具
- Rosetta - Text processing tools and wrappers (e.g. Vowpal Wabbit)
- BLLIP Parser - BLLIP Natural Language Parser 的Python綁定(即 Charniak-Johnson parser)
- PyNLPl - Python的天然語言處理庫。還包含用於解析常見NLP格式的工具,如FoLiA, 以及 ARPA language models, Moses phrasetables, GIZA++ 對齊等。
- python-ucto - ucto(面向unicode的基於規則的tokenizer)的Python 綁定
- python-frog - Frog的Python 綁定。荷蘭語的詞性標註,lemmatisation,依存分析,NER。
- python-zpar - ZPar的Python 綁定(英文的基於統計的詞性標註, constiuency解析器和依賴解析器)
- colibri-core - 高效提取 n-grams 和 skipgrams的C++庫的Python 綁定
- spaCy - 工業級 NLP with Python and Cython.
- PyStanfordDependencies - 將 Penn Treebank tree轉換到Stanford 依存樹的Python接口.
通用機器學習
數據分析、可視化
- SciPy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
- NumPy - A fundamental package for scientific computing with Python.
- Numba - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.
- NetworkX - A high-productivity software for complex networks.
- Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
- Open Mining - Business Intelligence (BI) in Python (Pandas web interface)
- PyMC - Markov Chain Monte Carlo sampling toolkit.
- zipline - A Pythonic algorithmic trading library.
- PyDy - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
- SymPy - A Python library for symbolic mathematics.
- statsmodels - Statistical modeling and econometrics in Python.
- astropy - A community Python library for Astronomy.
- matplotlib - A Python 2D plotting library.
- bokeh - Interactive Web Plotting for Python.
- plotly - Collaborative web plotting for Python and matplotlib.
- vincent - A Python to Vega translator.
- d3py - A plottling library for Python, based on D3.js.
- ggplot - Same API as ggplot2 for R.
- ggfortify - Unified interface to ggplot2 popular R packages.
- Kartograph.py - Rendering beautiful SVG maps in Python.
- pygal - A Python SVG Charts Creator.
- PyQtGraph - A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
- pycascading
- Petrel - Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
- Blaze - NumPy and Pandas interface to Big Data.
- emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
- windML - A Python Framework for Wind Energy Analysis and Prediction
- vispy - GPU-based high-performance interactive OpenGL 2D/3D data visualization library
- cerebro2 A web-based visualization and debugging platform for NuPIC.
- NuPIC Studio An all-in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool!
- SparklingPandas Pandas on PySpark (POPS)
- Seaborn - A python visualization library based on matplotlib
- bqplot - An API for plotting in Jupyter (IPython)
Common Lisp
通用機器學習
- mgl - Neural networks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes
- mgl-gpr - Evolutionary algorithms
- cl-libsvm - Wrapper for the libsvm support vector machine library
Clojure
天然語言處理
通用機器學習
- Touchstone - Clojure A/B testing library
- Clojush - he Push programming language and the PushGP genetic programming system implemented in Clojure
- Infer - Inference and machine learning in clojure
- Clj-ML - A machine learning library for Clojure built on top of Weka and friends
- Encog - Clojure wrapper for Encog (v3) (Machine-Learning framework that specializes in neural-nets)
- Fungp - A genetic programming library for Clojure
- Statistiker - Basic Machine Learning algorithms in Clojure.
- clortex - General Machine Learning library using Numenta’s Cortical Learning Algorithm
- comportex - Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm
數據分析、可視化
- Incanter - Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
- PigPen - Map-Reduce for Clojure.
- Envision - Clojure Data Visualisation library, based on Statistiker and D3
Matlab
計算機視覺
- Contourlets - MATLAB source code that implements the contourlet transform and its utility functions.
- Shearlets - MATLAB code for shearlet transform
- Curvelets - The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles.
- Bandlets - MATLAB code for bandlet transform
- mexopencv - Collection and a development kit of MATLAB mex functions for OpenCV library
天然語言處理
- NLP - An NLP library for Matlab
通用機器學習
- t-Distributed Stochastic Neighbor Embedding - t-SNE是一個獲獎的技術,能夠降維,尤爲適合高維數據可視化
- Spider - The spider有望成爲matlab裏機器學習中的完整的面向對象環境。
- LibSVM - 著名的支持向量機庫。
- LibLinear - A Library for Large Linear Classification
- Caffe - deep learning 框架,結構清晰,可讀性好,速度快。
- Pattern Recognition Toolbox - Matlab機器學習中一個完整的面向對象的環境。
- Optunity - A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly with MATLAB.致力於自動化超參數優化的,一個簡單的,輕量級的API庫,方便直接替換網格搜索。 Optunity是用Python編寫的,但與MATLAB的無縫鏈接。
數據分析、可視化
- matlab_gbl - MatlabBGL is a Matlab package for working with graphs.
- gamic - Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL's mex functions.
.NET
計算機視覺
- OpenCVDotNet - A wrapper for the OpenCV project to be used with .NET applications.
- Emgu CV - Cross platform wrapper of OpenCV which can be compiled in Mono to e run on Windows, Linus, Mac OS X, iOS, and Android.
- AForge.NET - Open source C# framework for developers and researchers in the fields of Computer Vision and Artificial Intelligence. Development has now shifted to GitHub.
- Accord.NET - Together with AForge.NET, this library can provide image processing and computer vision algorithms to Windows, Windows RT and Windows Phone. Some components are also available for Java and Android.
天然語言處理
- Stanford.NLP for .NET - A full port of Stanford NLP packages to .NET and also available precompiled as a NuGet package.
通用機器學習
- Accord-Framework - 一個完整的框架,能夠用於機器學習,計算機視覺,computer audition, 信號處理,統計應用等。.
- Accord.MachineLearning - Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications. This package is part of the Accord.NET Framework.
- DiffSharp - An automatic differentiation (AD) library providing exact and efficient derivatives (gradients, Hessians, Jacobians, directional derivatives, and matrix-free Hessian- and Jacobian-vector products) for machine learning and optimization applications. Operations can be nested to any level, meaning that you can compute exact higher-order derivatives and differentiate functions that are internally making use of differentiation, for applications such as hyperparameter optimization.
- Vulpes - Deep belief and deep learning implementation written in F# and leverages CUDA GPU execution with Alea.cuBase.
- Encog - An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks.
- Neural Network Designer - DBMS management system and designer for neural networks. The designer application is developed using WPF, and is a user interface which allows you to design your neural network, query the network, create and configure chat bots that are capable of asking questions and learning from your feed back. The chat bots can even scrape the internet for information to return in their output as well as to use for learning.
數據分析、可視化
- numl - numl is a machine learning library intended to ease the use of using standard modeling techniques for both prediction and clustering.
- Math.NET Numerics - Numerical foundation of the Math.NET project, aiming to provide methods and algorithms for numerical computations in science, engineering and every day use. Supports .Net 4.0, .Net 3.5 and Mono on Windows, Linux and Mac; Silverlight 5, WindowsPhone/SL 8, WindowsPhone 8.1 and Windows 8 with PCL Portable Profiles 47 and 344; Android/iOS with Xamarin.
- Sho - Sho is an interactive environment for data analysis and scientific computing that lets you seamlessly connect scripts (in IronPython) with compiled code (in .NET) to enable fast and flexible prototyping. The environment includes powerful and efficient libraries for linear algebra as well as data visualization that can be used from any .NET language, as well as a feature-rich interactive shell for rapid development.
Ruby
天然語言處理
- Treat - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby
- Ruby Linguistics - Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independent front end, a module for mapping language codes into language names, and a module which contains various English-language utilities.
- Stemmer - Expose libstemmer_c to Ruby
- Ruby Wordnet - This library is a Ruby interface to WordNet
- Raspel - raspell is an interface binding for ruby
- UEA Stemmer - Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing
- Twitter-text-rb - A library that does auto linking and extraction of usernames, lists and hashtags in tweets
通用機器學習
數據分析、可視化
- rsruby - Ruby - R bridge
- data-visualization-ruby - Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby
- ruby-plot - gnuplot wrapper for ruby, especially for plotting roc curves into svg files
- plot-rb - A plotting library in Ruby built on top of Vega and D3.
- scruffy - A beautiful graphing toolkit for Ruby
- SciRuby
- Glean - A data management tool for humans
- Bioruby
- Arel
Misc
R
通用機器學習
- ahaz - ahaz: Regularization for semiparametric additive hazards regression
- arules - arules: Mining Association Rules and Frequent Itemsets
- bigrf - bigrf: Big Random Forests: Classification and Regression Forests for Large Data Sets
- bigRR - bigRR: Generalized Ridge Regression (with special advantage for p >> n cases)
- bmrm - bmrm: Bundle Methods for Regularized Risk Minimization Package
- Boruta - Boruta: A wrapper algorithm for all-relevant feature selection
- bst - bst: Gradient Boosting
- C50 - C50: C5.0 Decision Trees and Rule-Based Models
- caret - Classification and Regression Training: Unified interface to ~150 ML algorithms in R.
- caretEnsemble - caretEnsemble: Framework for fitting multiple caret models as well as creating ensembles of such models.
- Clever Algorithms For Machine Learning
- CORElearn - CORElearn: Classification, regression, feature evaluation and ordinal evaluation
- CoxBoost - CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks
- Cubist - Cubist: Rule- and Instance-Based Regression Modeling
- e1071 - e1071: Misc Functions of the Department of Statistics (e1071), TU Wien
- earth - earth: Multivariate Adaptive Regression Spline Models
- elasticnet - elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA
- ElemStatLearn - ElemStatLearn: Data sets, functions and examples from the book: "The Elements of Statistical Learning, Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman Prediction" by Trevor Hastie, Robert Tibshirani and Jerome Friedman
- evtree - evtree: Evolutionary Learning of Globally Optimal Trees
- fpc - fpc: Flexible procedures for clustering
- frbs - frbs: Fuzzy Rule-based Systems for Classification and Regression Tasks
- GAMBoost - GAMBoost: Generalized linear and additive models by likelihood based boosting
- gamboostLSS - gamboostLSS: Boosting Methods for GAMLSS
- gbm - gbm: Generalized Boosted Regression Models
- glmnet - glmnet: Lasso and elastic-net regularized generalized linear models
- glmpath - glmpath: L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model
- GMMBoost - GMMBoost: Likelihood-based Boosting for Generalized mixed models
- grplasso - grplasso: Fitting user specified models with Group Lasso penalty
- grpreg - grpreg: Regularization paths for regression models with grouped covariates
- h2o - A framework for fast, parallel, and distributed machine learning algorithms at scale -- Deeplearning, Random forests, GBM, KMeans, PCA, GLM
- hda - hda: Heteroscedastic Discriminant Analysis
- Introduction to Statistical Learning
- ipred - ipred: Improved Predictors
- kernlab - kernlab: Kernel-based Machine Learning Lab
- klaR - klaR: Classification and visualization
- lars - lars: Least Angle Regression, Lasso and Forward Stagewise
- lasso2 - lasso2: L1 constrained estimation aka ‘lasso’
- LiblineaR - LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library
- LogicReg - LogicReg: Logic Regression
- Machine Learning For Hackers
- maptree - maptree: Mapping, pruning, and graphing tree models
- mboost - mboost: Model-Based Boosting
- medley - medley: Blending regression models, using a greedy stepwise approach
- mlr - mlr: Machine Learning in R
- mvpart - mvpart: Multivariate partitioning
- ncvreg - ncvreg: Regularization paths for SCAD- and MCP-penalized regression models
- nnet - nnet: Feed-forward Neural Networks and Multinomial Log-Linear Models
- oblique.tree - oblique.tree: Oblique Trees for Classification Data
- pamr - pamr: Pam: prediction analysis for microarrays
- party - party: A Laboratory for Recursive Partytioning
- partykit - partykit: A Toolkit for Recursive Partytioning
- penalized - penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model
- penalizedLDA - penalizedLDA: Penalized classification using Fisher's linear discriminant
- penalizedSVM - penalizedSVM: Feature Selection SVM using penalty functions
- quantregForest - quantregForest: Quantile Regression Forests
- randomForest - randomForest: Breiman and Cutler's random forests for classification and regression
- randomForestSRC - randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC)
- rattle - rattle: Graphical user interface for data mining in R
- rda - rda: Shrunken Centroids Regularized Discriminant Analysis
- rdetools - rdetools: Relevant Dimension Estimation (RDE) in Feature Spaces
- REEMtree - REEMtree: Regression Trees with Random Effects for Longitudinal (Panel) Data
- relaxo - relaxo: Relaxed Lasso
- rgenoud - rgenoud: R version of GENetic Optimization Using Derivatives
- rgp - rgp: R genetic programming framework
- Rmalschains - Rmalschains: Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R
- rminer - rminer: Simpler use of data mining methods (e.g. NN and SVM) in classification and regression
- ROCR - ROCR: Visualizing the performance of scoring classifiers
- RoughSets - RoughSets: Data Analysis Using Rough Set and Fuzzy Rough Set Theories
- rpart - rpart: Recursive Partitioning and Regression Trees
- RPMM - RPMM: Recursively Partitioned Mixture Model
- RSNNS - RSNNS: Neural Networks in R using the Stuttgart Neural Network Simulator (SNNS)
- RWeka - RWeka: R/Weka interface
- RXshrink - RXshrink: Maximum Likelihood Shrinkage via Generalized Ridge or Least Angle Regression
- sda - sda: Shrinkage Discriminant Analysis and CAT Score Variable Selection
- SDDA - SDDA: Stepwise Diagonal Discriminant Analysis
- SuperLearner and subsemble - Multi-algorithm ensemble learning packages.
- svmpath - svmpath: svmpath: the SVM Path algorithm
- tgp - tgp: Bayesian treed Gaussian process models
- tree - tree: Classification and regression trees
- varSelRF - varSelRF: Variable selection using random forests
- XGBoost.R - R binding for eXtreme Gradient Boosting (Tree) Library
- Optunity - A library dedicated to automated hyperparameter optimization with a simple, lightweight API to facilitate drop-in replacement of grid search. Optunity is written in Python but interfaces seamlessly to R.
數據分析、可視化
- ggplot2 - A data visualization package based on the grammar of graphics.
Scala
天然語言處理
- ScalaNLP - ScalaNLP is a suite of machine learning and numerical computing libraries.
- Breeze - Breeze is a numerical processing library for Scala.
- Chalk - Chalk is a natural language processing library.
- FACTORIE - FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
數據分析、可視化
- MLlib in Apache Spark - Distributed machine learning library in Spark
- Scalding - A Scala API for Cascading
- Summing Bird - Streaming MapReduce with Scalding and Storm
- Algebird - Abstract Algebra for Scala
- xerial - Data management utilities for Scala
- simmer - Reduce your data. A unix filter for algebird-powered aggregation.
- PredictionIO - PredictionIO, a machine learning server for software developers and data engineers.
- BIDMat - CPU and GPU-accelerated matrix library intended to support large-scale exploratory data analysis.
- Wolfe Declarative Machine Learning
通用機器學習
- Conjecture - Scalable Machine Learning in Scalding
- brushfire - Distributed decision tree ensemble learning in Scala
- ganitha - scalding powered machine learning
- adam - A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
- bioscala - Bioinformatics for the Scala programming language
- BIDMach - CPU and GPU-accelerated Machine Learning Library.
- Figaro - a Scala library for constructing probabilistic models.
- H2O Sparkling Water - H2O and Spark interoperability.