Sparks的Swarm集羣部署

Sparks的Swarm集羣部署

A Swarm of Sparkshtml

翻譯自:https://medium.com/@daviws/a-swarm-of-sparks-8f5a4afc72ccnode

Web scale computing has never been so simple
基於Web的可伸縮計算已變得史無前例的簡單。ios

I work at WorldSense, where we build predictors for the best links you could add in your content by creating large language models from the World Wide Web. In the open source world, no tool is better suited for that kind of mass (hyper)text analysis than Apache Spark, and I wanted to share how we set it up and run it on the cloud, so you can give it a try.git

我工做在WorldSense,咱們建立一些連接的預測器,您能夠經過建立來自全球的大型語言模型將其添加到你的內容中。在開源的世界中,沒有比Apache Spark更適合超大文本分析的這一任務的了,我但願分享一些安裝設置並運行雲中的經驗,但願你也能試一試。github

Spark is a distributed system, and as any similar system, it has a somewhat demanding configuration. There is a plethora of ways of running Spark, and in this post I will try to describe what I think is a great setup nowadays: a standalone cluster running (mostly) on bare-bones Amazon EC2 spot instances configured using the newest Docker orchestrations tools.web

Intermission: today I had the pleasure of playing with the amazing Databricks spark notebook during the Spark East Summit, and I highly recommend it.面試

Spark是一個分佈式系統,像其它相似系統同樣,須要作一些專門的配置。運行Spark有不少種方法,這裏我設想的運行方法:以獨立集羣模式運行在Amazon的EC2上面,使用最新的Docker集成工具。docker

順便說一下,我在Spark East Summit, 試了一下 Databricks spark notebook,頗有吸引力,所以推薦使用這種方法。shell

Back to work. Before we start, let us double check what we need for our spark setup:apache

  • The hardware, in the form of some machines in the cloud.
  • The software, Apache Spark, installed in each of them.
  • An abstraction layer to create a cluster from those machines.
  • Some coordination point through which all of this come to life.

言歸正傳,在開始以前,咱們檢查一下運行Spark須要那些東西:

  • 硬件,雲中的一些虛擬機形式。
  • 軟件,Apache Spark,安裝在每一臺機器上。
  • 從這些機器建立出集羣的抽象層。
  • 投入運行中的協調和管理的方式。

We will move backwards through this list, as it makes it easier to present the different systems involved. We allocate our machines with Docker Machine, using the very latest docker engine version (v.1.10 is out already, no need to explicitly ask for it any longer), which contains all the functionality we need. Let us start with a very small machine:

我經過Docker Machine分配了一些機器,使用最新版(V1.10,譯註:Docker發展很快,到官網下載最新版),咱們先從最小的機器配置開始。

CLUSTER_PREFIX=c${USER}
DRIVER_OPTIONS="--driver amazonec2 --amazonec2-security-group=default" 
# no longer needed: --engine-install-url https://test.docker.com
docker-machine create $DRIVER_OPTIONS --amazonec2-instance-type=t2.nano ${CLUSTER_PREFIX}ks

We will use that machine for Consul, an atomic distributed key-value store, inspired by Google's chubby. Consul will be responsible for keeping track of who is part of our cluster, among other things. Installing it is trivial, since someone on the internet already packed it as a Docker container for us:

我將這臺機器用於Consul ,用於分佈式的K-V存儲。Consul將負責跟蹤集羣中的部件。安裝起來很容易,由於已經有人將其封裝成了Docker Container。以下方式,便可啓動一個Consul的容器實例:

docker $(docker-machine config ${CLUSTER_PREFIX}ks) 
run -d -p "8500:8500" -h "consul" progrium/consul -server -bootstrap

This takes a few minutes to start, but you should only really need to do that once per cluster¹. Every time you bring the cluster up you can point to that same Consul instance, and keeping a t2.nano running will cost you less than five bucks an year.

這須要幾分鐘時間啓動,不過對於每個集羣只須要作一次。每一次集羣啓動時,都將指向這同一個Consul實例,而且跟蹤t2.nano的運行。

Now we can instantiate the cluster's master machine. The core responsibility of this machine is coordinating the workers. It will be both the Spark master machine and the manager for our Docker Swarm, the system responsible for presenting the machines and containers as a cluster.

如今,咱們初始化集羣的主控節點。這個節點的主要任務是協調各個worker節點,既是Spark Master節點,也是Dcoker Swarm的管理節點(管理容器的集羣和相應的服務器)。

NET_ETH=eth0
KEYSTORE_IP=$(aws ec2 describe-instances | jq -r ".Reservations[].Instances[] | select(.KeyName==\"${CLUSTER_PREFIX}ks\" and .State.Name==\"running\") | .PrivateIpAddress")
SWARM_OPTIONS="--swarm --swarm-discovery=consul://$KEYSTORE_IP:8500 --engine-opt=cluster-store=consul://$KEYSTORE_IP:8500 --engine-opt=cluster-advertise=$NET_ETH:2376"
MASTER_OPTIONS="$DRIVER_OPTIONS $SWARM_OPTIONS --swarm-master -engine-label role=master --amazonec2-instance-type=m4.large"
MASTER=${CLUSTER_PREFIX}n0
docker-machine create $MASTER_OPTIONS --amazonec2-instance-type=m4.large $MASTER

There are a few interesting things going on here. First, we used some shell-fu to find the IP address of our Consul machine inside the Amazon network. Then we fed that to the swarm-discovery and cluster-store options so Docker can keep track of the nodes in our cluster and the network layout of the containers running in each of them. With the configs in place, we proceeded to create a m4.large machine, and labeled it as our master. We now have a fully functional 1-machine cluster, and can run jobs on it. Just point to the Docker Swarm manager and treat it as a regular Docker daemon.

這裏有一些有趣的東西。首先,咱們須要一個腳本去發現Amazon網絡上的Consul機器的IP地址。而後,用於Swarm的發現和集羣設置選項,讓Docker能夠跟蹤集羣中每個節點和每個容器的網絡。在配置時,咱們建立了m4.large機器,而且標記爲Master。咱們如今有了一個全功能的單節點集羣,並且能夠在上面運行任務了。只須要指向Docker Swarm管理器,將其看成一個一般的Docker伺服器。

docker $(docker-machine config --swarm $MASTER) run hello-world

To install Spark on our cluster, we will use Docker Compose, another tool from the Docker family. With Compose we can describe how to install and configure a set of containers. Starting from scratch is easy, but we will take a shortcut by using an existing image, gettyimages/spark, and only focus on the configuration part. Here is the result, which you should save in a docker-compose.yml file in the local directory.

爲了在這個集羣上安裝Spark,咱們使用Docker Compose,這是Docker系列工具中的另一個。經過Comopse文件,能夠描述如何安裝和配置一系列容器實例。從頭開始並不難,但咱們從一個現成的鏡像 gettyimages/spark開始,不失爲一條捷徑,而後重點放在配置方面。以下所示,你能夠將下面的內容存儲爲文件docker-compose.yml,放在本地目錄下。

version: "2"
services:
  master:
    container_name: master
    image: gettyimages/spark:1.6.0-hadoop-2.6
    command: /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h master
    hostname: master
    environment:
      - constraint:role==master
    ports:
      - 4040:4040
      - 6066:6066
      - 7077:7077
      - 8080:8080
    expose:
      - "8081-8095"
worker:
    image: gettyimages/spark:1.6.0-hadoop-2.6
    command: /usr/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
    environment:
      - constraint:role!=master
    ports:
      - 8081:8081
    expose:
      - "8081-8095"
networks:
  default:
    driver: overlay

There are a lot of knobs in Spark, and they can all be controlled through that file. You can even customize the spark distribution itself using a Dockerfile and custom base images, as we do at WorldSense to get Scala 2.11 and a lot of heavy libraries². In this example, we are doing the bare minimal, which is just opening the operational ports to the world, plus the spark internal ports to the rest of the cluster (the expose directive).

Spark涉及到的不少配置,均可以經過這個文件來進行控制。進一步,還能夠經過Dockerfile來定製基礎鏡像,我在worldsense中使用Scala 2.11盒其它的一些庫。在這裏,我創建了一個最小集合,打開了外部操做端口和Spark內部端口給集羣使用。

Also note the parts of the config referring to the overlay network. The default network is where all services defined in the config file will run, which means they can communicate with each other using the container name as the target hostname. The swarm scheduler will decide for us on which machine each container goes, respecting the constraints⁷ we have put in place. In our config file, we have one that pins the master service in the master machine (which is not very powerful) and another which keeps the workers outside that machine. Let us try bringing up the master:

須要注意到,這裏的網絡使用了overlay network。全部服務的缺省網絡是配置文件中定義的,可使用容器名稱做爲宿主機名稱進行相互通信。Swarm scheduler爲咱們決定每個容器運行在哪個機器上,除非咱們制定了相應的限制條件。在這裏配置文件中,咱們將Master service指定在master主機上,其它的Worker放在別的機器上。下面試着啓動master

eval $(docker-machine env --swarm $MASTER)
docker-compose up -d master
lynx http://$(aws ec2 describe-instances | jq -r ".Reservations[].Instances[] 
| select(.KeyName==\"$MASTER\" and .State.Name==\"running\") | .PublicDnsName"):8080

So far we have bootstrapped out architecture with Consul, defined our cluster with Docker Swarm and delineated our spark installation with Docker Compose. The last remaining step is to add the bulk of the machines which will do the heavy work.

咱們基於Consul創建了Docker Swarm的集羣,使用Docker Compose安裝了Spark,下一步添加幾個執行實際計算工做的機器節點。

The worker machines should be more powerful, and you don't have to care too much about the stability of the individual instances. These properties make workers a perfect candidate for Amazon EC2 spot instances. They often cost less than one forth of the price of a reserved machine, a bargain you can't get elsewhere. Let us bring a few of them up, using docker-machine³ and the very helpful gnu parallel⁴ script.

worker機器須要計算能力比較強大,並且不須要太擔憂單個實例的穩定性問題。Amazon EC2 spot instances比較知足這個特性,比預留的機器的價格常常便宜四分之一。咱們將啓動一些這種實例,使用docker-machine工具和gnu parallel腳原本完成,以下。

WORKER_OPTIONS="$DRIVER_OPTIONS $SWARM_OPTIONS 
--amazonec2-request-spot-instance 
--amazonec2-spot-price=0.074 
--amazonec2-instance-type=m4.2xlarge"

CLUSTER_NUM_NODES=11
parallel -j0 --no-run-if-empty --line-buffer docker-machine create \
$WORKER_OPTIONS < <(for n in $(seq 1 $CLUSTER_NUM_NODES); \
do echo "${CLUSTER_PREFIX}n$n"; done)

You now have over 300 cores available in your cluster, for less than a dollar an hour. Last month in WorldSense we used a similar cluster to process over 2 billion web pages from the common crawl repository over a few days. For now, let us bring up everything and compute the value of pi:

如今,在你的集羣中有超過300個計算核可用。最近一個月,在WordSense我使用相似的集羣處理了二十億來自於common crawl repository的Web頁面。下面,咱們啓動全部節點來計算pi的值:

eval $(docker-machine env --swarm $MASTER)
docker-compose scale master=1 worker=10
docker run --net=container:master --entrypoint spark-submit gettyimages/spark:1.6.0-hadoop-2.6 --master spark://master:7077 --class org.apache.spark.examples.SparkPi /usr/spark/lib/spark-examples-1.6.0-hadoop2.6.0.jar

In a more realistic scenario one would use something like rsync to push locally developed jars in the master machine, and then use docker volume support to expose those to the driver. That is how we do it in WorldSense⁵.

在更須要彈性的場合,可使用rsync上載本地開發的jars到master主機,而後使用docker volume支持,這也是咱們在WorldSense裏所作的。

I think this is a powerful setup, with the great advantage that it is also easy to debug and replicate locally. I can simply change a bit the flags⁶ in these scripts to get virtually the same environment in my laptop. This flexibility has been helpful countless times.

我認爲這是一個很是強大的安裝方案,很大的優點時容易進行調試和在本地複製。我能夠簡單滴改變幾個參數就能在本地環境運行起來,這很是靈活、節省了大量時間。

Many companies offer hosted solutions for running code in Spark, and I highly recommend giving them a try. In our case, we had both budget restrictions and flexibility requirements that forced us into a custom deployment. It hasn't come without its costs, but we are sure having some fun.

Ah, talking about costs, do not forget to bring your cluster down!

不少公司都提供了運行Spark代碼的主機託管方案,我強烈建議你去試一試。

說到成本問題,用完後不要忘了讓您的集羣關閉!(譯註:若是不關閉,Amazon會一直收費!)

docker-machine ls | grep "^${CLUSTER_PREFIX}" | cut -d\  -f1 | xargs docker-machine rm -y

This text was cross-posted from WorldSense’s blog at worldsense.com/blog.

Footnotes

  1. The need for serialized creation of the cluster-store should improve at some point.
  2. Spark runs jobs in its workers jvm, and sometimes it is really hard to avoid jar-hell when you have some library version in your code and the spark workers already have a different version. For some cases, the only solution is to modify the pom.xml that generates the workers jar itself, and we have done that to fix incompatibilities with logback, dropwizard, and jackson, among others. If you find yourself in the same position, don't be afraid to try that. It works.
  3. Machine allocation with docker-machine is very simple, but not super reliable. I often have some slaves that do not install correctly, and I simply kill them in a shell loop checking for the success of docker-machine env.
  4. GNU Parallel requires a citation, and I have to say that I do it happily. Before the advent of docker swarm, most of the setup we used was powered by GNU Parallel alone :-).
    O. Tange (2011): GNU Parallel — The Command-Line Power Tool,
     ;login: The USENIX Magazine, February 2011:42–47.
  5. By splitting our jars in rarely-changed dependencies and our own code, most of the time running fresh code in the cluster is just a matter of uploading a couple of megabytes.
  6. In my laptop, I need the following changes: DRIVER_OPTIONS= — driver virtualbox, NET_ETH=eth1 and KEYSTORE_IP=$(docker-machine ip keystore).
  7. I have had trouble recently with constraints in more complex scenarios, although they work fine with the simple examples in this page. Unfortunately this has prevented a more aggressive migration of our infrastructure to swarm.
相關文章
相關標籤/搜索