【互動問答分享】第15期決勝雲計算大數據時代Spark亞太研究院公益大講堂

「決勝雲計算大數據時代」app

Spark亞太研究院100期公益大講堂 【第15期互動問答分享】ide

 

Q1:AppClient和worker、master之間的關係是什麼?oop

  • :AppClient是在StandAlone模式下SparkContext.runJob的時候在Client機器上應       用程序的表明,要完成程序的registerApplication等功能; 大數據

  • 當程序完成註冊後Master會經過Akka發送消息給客戶端來啓動Driver;this

  • 在Driver中管理Task和控制Worker上的Executor來協同工做;雲計算

 

Q2:Spark的shuffle 和hadoop的shuffle的區別大麼?spa

  • Spark的Shuffle是一種比較嚴格意義上的shuffle,在Spark中Shuffle是有RDD操做的依賴關係中的Lineage上父RDD中的每一個partition元素的內容交給多個子RDD; ip

  • 在Hadoop中的Shuffle是一個相對模糊的概念,Mapper階段介紹後把數據交給Reducer就會產生Shuffle,Reducer三階段的第一個階段便是Shuffle;hadoop

     

Q3:Spark 的HA怎麼處理的? rem

  • 對於Master的HA,在Standalone模式下,Worker節點自動是HA的,對於Master的HA,通常採用Zookeeper;

  • Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected 「leader」 and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master’s state, and then resume scheduling. The entire recovery process (from the time the the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling new applications – applications that were already running during Master failover are unaffected

  • 對於Yarn和Mesos模式,ResourceManager通常也會採用ZooKeeper進行HA;

相關文章
相關標籤/搜索