[GPU] Machine Learning on C++

時間 2019-11-18

標籤 gpu machine learning c++ 欄目 C&C++ 简体版

原文原文鏈接

1、MPI爲什麼物？

2、從新認識Spark

連接：https://www.zhihu.com/question/48743915/answer/115738668git

馬鐵大神的phd thesis 總結裏面說了一句話大概意思是說單純的若是使用mpi 來實現一個算法比spark 快五六倍是很正常的可是spark 是一個 general 的 data flow 處理框架就是能夠在數據的生命週期裏面可使用spark 之上的具體實現來處理數據 ml 只是一部分而已這就是spark 最大的賣點之一github

因此你用這個Prophet平臺來和spark 比 ml這方面的效率固然你要快了的由於還有不少ml 專業的平臺都要比spark 快這就不列舉了
由於spark 基於 mapreduce的這種program model 就不是適合ml的特別是ml 裏面大量參數的模型好比lda 之類的算法

btw：若是做爲一個嚴格的論文來看的話把spark 做爲baseline 而不是作普遍的實驗比較的話好比各類平臺算法數據集算法

3、Microsoft Distributed Machine Learning Toolkit (DMTK)

連接來源： https://indico.cern.ch/event/605622/contributions/2482399/attachments/1418253/2172239/TMVA_ROOTMpi.pdf

Goto: https://github.com/Microsoft/DMTK

Ref: 微軟分佈式機器學習工具包DMTK——初窺門徑

DMTK includes the following projects:shell

DMTK framework(Multiverso): The parameter server framework for distributed machine learning.
LightLDA: Scalable, fast and lightweight system for large-scale topic modeling.
LightGBM: LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Distributed word embedding: Distributed algorithm for word embedding implemented on multiverso.