Spark 學習資源收集

(一)spark 相關安裝部署、開發環境

一、Spark 僞分佈式 & 全分佈式 安裝指南html

http://my.oschina.net/leejun2005/blog/394928java

二、Apache Spark探祕:三種分佈式部署方式比較python

http://dongxicheng.org/framework-on-yarn/apache-spark-comparing-three-deploying-ways/nginx

三、idea上運行local的spark sql hivegit

http://dataknocker.github.io/2014/10/11/idea%E4%B8%8A%E8%BF%90%E8%A1%8Clocal%E7%9A%84spark-sql-hive/github

四、Apache Spark學習:利用Scala語言開發Spark應用程序sql

http://dongxicheng.org/framework-on-yarn/spark-scala-writing-application/apache

五、如何在CDH5上運行Spark應用(Scala、Java、Python)編程

http://blog.javachen.com/2015/02/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/segmentfault

六、Spark集羣安裝和使用

http://blog.javachen.com/2014/07/01/spark-install-and-usage/#

 

(二)spark 架構、原理與編碼

一、理解Spark的核心RDD

http://www.infoq.com/cn/articles/spark-core-rdd

二、How-to: Translate from MapReduce to Apache Spark(怎樣從 MapReduce 遷移到 Spark)

http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/

三、Spark SQL 源碼分析之 In-Memory Columnar Storage 之 cache table

http://blog.csdn.net/oopsoom/article/details/39525483

四、Databricks Spark 知識庫

http://aiyanbo.gitbooks.io/databricks-spark-knowledge-base-zh-cn/content/

五、Spark1.0.0 編程模型

http://blog.csdn.net/book_mmicky/article/details/32096871

六、Spark技術內幕:Client,Master和Worker 通訊源碼解析

http://blog.csdn.net/anzhsoft/article/details/30802603

七、Spark Streaming編程指南

http://yangqijun.com/archives/200

八、Spark分佈式計算執行模型

http://www.flickering.cn/%E5%88%86%E5%B8%83%E5%BC%8F%E8%AE%A1%E7%AE%97/2014/07/spark%E5%88%86%E5%B8%83%E5%BC%8F%E8%AE%A1%E7%AE%97%E6%89%A7%E8%A1%8C%E6%A8%A1%E5%9E%8B/

九、Top 3 Troubleshooting Tips To Keep You Sparking

http://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/

十、Apache Spark 設計與實現(重點關注設計思想、運行原理、實現架構及性能調優,附帶討論與 MapReduce 在設計與實現上的區別。)

https://github.com/JerryLead/SparkInternals/tree/master/markdown

十一、Spark Examples

http://spark.apache.org/examples.html

十二、RDD操做詳解

http://dataknocker.github.io/2014/07/20/RDD%E5%90%84%E6%93%8D%E4%BD%9C%E8%AF%A6%E8%A7%A3/

1三、Spark編程指南筆記

http://blog.javachen.com/2015/02/03/spark-programming-guide/#

1四、Spark Core Runtime分析: DAGScheduler, TaskScheduler, SchedulerBackend

http://blog.csdn.net/pelick/article/details/44495611

1五、Getting Started with Spark (in Python)

https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python

1六、Spark編程指南筆記

http://blog.javachen.com/2015/02/03/spark-programming-guide/#

1七、Spark SQL中的DataFrame

http://blog.javachen.com/2015/03/26/spark-sql-dataframe/#

1八、Spark RDD API詳解(一) Map和Reduce

https://www.zybuluo.com/jewes/note/35032

1九、Spark算子系列文章

http://lxw1234.com/archives/2015/07/363.htm

20、Spark Streaming實踐和優化

http://bit.ly/1QsQ2Ot

 

 

(三)spark 監控與管理

一、Common Spark Troubleshooting

http://www.datastax.com/dev/blog/common-spark-troubleshooting

二、

 

(四)YARN & spark

一、Apache Spark探祕:多進程模型仍是多線程模型?

http://dongxicheng.org/framework-on-yarn/apache-spark-multi-threads-model/

 

(五)spark 數據平臺架構

 

 

(六)spark 應用與實踐

一、How-to: Do Near-Real Time Sessionization with Spark Streaming and Apache Hadoop

http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/

二、Integrating Kafka and Spark Streaming: Code Examples and State of the Game

http://www.michael-noll.com/blog/2014/10/01/kafka-spark-streaming-integration-example-tutorial/

三、spark讀取 kafka nginx網站日誌消息 並寫入HDFS中

http://yangqijun.com/archives/227

四、Flafka: Apache Flume Meets Apache Kafka for Event Processing

http://blog.cloudera.com/blog/2014/11/flafka-apache-flume-meets-apache-kafka-for-event-processing/

五、Log Analysis with Spark

http://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/README.html

六、Spark將計算結果寫入到Mysql中

http://www.iteblog.com/archives/1275

七、Spark Streaming 1.3對Kafka整合的提高詳解

http://www.iteblog.com/archives/1307

八、Spark SQL中的數據源

http://blog.javachen.com/2015/04/03/spark-sql-datasource/#

九、Kafka+Spark Streaming+Redis實時計算整合實踐

http://shiyanjun.cn/archives/1097.html

 

(七)spark 機器學習實踐

一、ML Pipelines: A New High-Level API for MLlib

http://databricks.com/blog/2015/01/07/ml-pipelines-a-new-high-level-api-for-mllib.html

二、Spark 0.9.1 MLLib 機器學習庫簡介

http://rdc.taobao.org/?p=2163

 

(八)Scala 學習指北

一、Spark開發指南(0.8.1中文版)

http://rdc.taobao.org/?p=2024

二、Swift和Scala語法上的諸多類似之處

http://segmentfault.com/a/1190000000575561

三、Awesome Scala

https://github.com/lauris/awesome-scala

四、scala(有關jvm,scala與後端架構,阿里工程師的博客,至關不錯)

http://hongjiang.info/scala/

五、Scala極速入門

http://my.oschina.net/mup/blog/363436?from=20150111

六、An-Overview-of-the-Scala-Programming-Language

https://github.com/wecite/papers/tree/master/An-Overview-of-the-Scala-Programming-Language

七、Scala簡明教程

http://colobu.com/2015/01/14/Scala-Quick-Start-for-Java-Programmers/

八、Scala 課堂

http://twitter.github.io/scala_school/zh_cn/index.html

九、Scala基本語法和概念

http://blog.javachen.com/2015/04/20/basic-of-scala.html

      Scala集合

http://blog.javachen.com/2015/04/22/scala-collections.html

十、scala 從入門到入門+

http://segmentfault.com/a/1190000003068853

 

 

(九)Spark book

一、Spark Cook Book

http://www.infoobjects.com/spark-cookbook/

二、Fast Data Processing with Spark

http://it-ebooks.info/book/3185/

三、Scala語言概覽

http://wecite.github.io/docs/ScalaOverview-20150226.pdf

四、Effective Scala

http://twitter.github.io/effectivescala/index-cn.html

五、有趣的 Scala 語言: 簡潔的 Scala 語法

http://www.ibm.com/developerworks/cn/java/j-lo-funinscala2/

相關文章
相關標籤/搜索