[DE] How to learn Big Data【瞭解大數據】html
[DE] Pipeline for Data Engineering【工做流案例示範】
python
[DE] ML on Big data: MLlib【大數據的機器學習方案】
git
[Spark] 00 - Install Hadoop & Spark【ing】github
[Spark] 01 - What is Spark【RDD原理和方法】算法
[Spark] 02 - Practice PySpark【實踐編程】sql
[Spark] 03 - Spark SQL【具備了SQL操做的便捷性】數據庫
[Spark] 04 - What is Spark Streamingapache
[Spark] 06 - Structured Streaming【對應 DataFrame】架構
[Full-stack] 一切皆在雲上 - AWS【AWS基礎服務】
[AWS] 01 - What is Amazon EMR【EMR簡介】
[AWS] 02 - Pipeline on EMR【基礎瞭解】
/* important */
[Code] 大蛇之數據工程【語法驅動】
[Code] 變態之人鍵合一【需求驅動】
[Pandas] 01 - A guy based on NumPy【如何高性能】
[Pandas] 02 - Tutorial of NumPy【NumPy常見用法】
[Pandas] 03 - DataFrame【讀入並處理表格】
[Pandas] 04 - Efficient I/O【從數據庫加載到arr, df, EArray】
[Feature] Preprocessing tutorial【偉哥的特徵工程步驟講解】
[Feature] Feature engineering【特徵工程大綱】
[Feature] Build pipeline【展現Pipeline大概思路過程】
[Feature] Final pipeline: custom transformers【本章總結】
[AI] 深度數學 - Bayes【Scikit-learn Cookbook】
[Distributed ML] Yi WANG's talk【王益大佬】
[Matplotlib] Data Representation
[Kaggle] Online Notebooks【模塊化代碼】
[Kaggle] How to kaggle?【方法導論】
[Kaggle] How to handle big data?【方法進階】
[ML] Pyspark ML tutorial for beginners【房價預測之"常規分析套路"】
[ML] Load and preview large scale data【保證特徵完整性】
[Link] https://spark.apache.org/docs/2.4.4/ml-guide.html
[ML] Pipeline in Distributed ML Library【Pipline"套路」】
[ML] Online learning【Pipline做爲 「在線學習」 的 「數據源」】
[Spark] Spark 3.0 Accelerator Aware Scheduling - GPU
[ML] LIBSVM Data: Classification, Regression, and Multi-label【三種方案時效對比】
[ML] Machine Learning in the Common Infrastructure ecosystem【架構瞭解】
本篇章終極形態,開發/優化一個大數據分佈式算法。
https://github.com/apache/spark/tree/master/examples/src/main/python/ml
https://spark.apache.org/mllib/
http://stanford.edu/~rezab/slides/
Distributed Computing with Spark, Reza Zadeh 20140623
Reza Zadeh, Scalable Machine Learning
Apache Spark™ ML and Distributed Learning (1/5) (databrick)
Module 4: Creating Distributed Algorithms
stanford.edu: Chapter 12 Large-Scale Machine Learning
<Large Scale Machine Learning with Python>
Processing Big Data in Main Memory and on GPU,2016年碩士論文
[Spark News] Spark + GPU are the next generation technology
/* implement */