在archlinux上搭建twitter storm cluster

本文詳細描述如何在archlinux上搭建twitter storm cluster,轉載請註明出處,謝謝。java

有關archlinux基本系統安裝,請參照archlinux簡明安裝指南一文,下面以上述爲基礎講解如何一步步安裝twitter storm cluster.node

先列出安裝主要步驟python

  1. 安裝oracle jdk
  2. 安裝必須的編譯工具gcc, g++, make
  3. 安裝python2.7, unzip
  4. 編譯安裝zeromq
  5. 編譯安裝jzmq
  6. 下載lein
  7. 下載storm-starter
  8. 下載storm release版本
  9. 安裝zookeeper爲了自動運行storm cluster,安裝supervisord

安裝oracle jdk

在linux平臺上標配的java是openjdk,若是要安裝oracle的jdk的話,須要從官方下載相應的安裝包。使用archlinux幸福的一點就是有yaourt,一切能夠變得很是簡單,:).linux

#yaourt -S jdk

注意安裝完的java路徑,應該是在/opt/java, 這個後面會用到。c++

修改/etc/profile, 添加環境變量JAVA_HOME,爲PATH添加/opt/java/bingit

PATH="/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/java/bin" export PATH
export JAVA_HOME="/opt/java"

安裝編譯工具

在twitter storm中會使用zeromq,由於zeromq是用c&c++編寫的,因此須要安裝相應的編譯工具,不要使用archlinux中的版本,由於目前pacman或aur中的zeromq版本是3.x,而twitter storm中須要的zeromq是2.1.7github

#pacman -S gcc g++ libtool pkg-config make autoconf git util-linux

安裝python2.7, unzip

#pacman -S python2 unzip 

編譯安裝zeromq,jzmq

從 http://download.zeromq.org/zeromq-2.1.7.tar.gz下載zeromq 2.1.7瀏覽器

#tar zvxf zeromq-2.1.7.tar.gz
#config
#make
#make install

安裝的路徑是/usr/local/liboracle

編譯安裝jzmqpython2.7

#git clone https://github.com/nathanmarz/jzmq.git
#cd jzmq
#./autogen.sh
#./configure --with-zeromq=/usr/local
#make 注意,此處可能會出錯,解決辦法是修改jzmq/src/Makefile.am,將classdist_noinst.stamp修改成classnoinst.stamp
#make install

安裝完zeromq和jzmq以後,修改/etc/ld.so.conf,在該文件中添加以下一行

/usr/local/lib

而後運行

#ldconfig 

爲了驗證libjzmq確實使用的zeromq是自行編譯的版本,可以使用以下命令進行檢測。

#ldd /usr/local/lib/libjzmq.so
linux-gate.so.1 (0xb779e000)
    libzmq.so.1 => /usr/local/lib/libzmq.so.1 (0xb7749000)
    libuuid.so.1 => /usr/lib/libuuid.so.1 (0xb7743000)
    librt.so.1 => /usr/lib/librt.so.1 (0xb773a000)
    libpthread.so.0 => /usr/lib/libpthread.so.0 (0xb771e000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7635000)
    libm.so.6 => /usr/lib/libm.so.6 (0xb75ee000)
    libc.so.6 => /usr/lib/libc.so.6 (0xb743e000)
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xb7422000)
    /usr/lib/ld-linux.so.2 (0xb779f000)

若是libzmq.so.1確實指向/usr/local/lib中的版本,則說明版本使用正確。

安裝storm-starter

storm-starter是由storm的做者爲了storm的初學者快速上手而建立的一個github項目。

#git clone https://github.com/nathanmarz/storm-starter.git

編譯運行, 注意這是運行在local模式而很是cluster模式

#lein deps
#lein compile
#java -cp $(lein classpath) storm.starter.ExclamationTopology

注:

    直接從http://leiningen.org/下載lein script,而非直接使用pacman或yaourt來安裝 

#chmod +x ./lein
#cp ./lein /usr/local/bin
#export LEIN_ROOT=1 若是想以root來運行lein,須要設置該變量  

安裝zookeeper

#yaourt -S zookeeper

做簡單的配置,修改文件/etc/zookeeper/zoo.cfg,使其內容以下所示

#The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/var/lib/zookeeper
# the port at which the clients will connect
clientPort=2181

由於zookeeper只對IPv6地址進行監聽,爲了強制其只監聽IPv4地址,須要修改/opt/zookeeper-3.4.5/bin/zkServer.sh,在start)一節中加入 "-Djava.net.preferIPv4Stack=true", 總體看起來以下所示

case $1 in
start)
    echo  -n "Starting zookeeper ... "
    if [ -f $ZOOPIDFILE ]; then
      if kill -0 `cat $ZOOPIDFILE` > /dev/null 2>&1; then
         echo $command already running as process `cat $ZOOPIDFILE`. 
         exit 0
      fi  
    fi  
    nohup $JAVA "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
      "-Djava.net.preferIPv4Stack=true" \
    -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &
    if [ $? -eq 0 ] 
    then
      if /bin/echo -n $! > "$ZOOPIDFILE"
      then
        sleep 1
        echo STARTED
      else
        echo FAILED TO WRITE PID
        exit 1
      fi  
    else
      echo SERVER DID NOT START
      exit 1
    fi  
    ;;  

注意藍底紅字的一行。

啓動zookeeper

#/opt/zookeeper-3.4.5/bin/zkServer.sh start 

 下載安裝storm

從storm-project.net下載storm-0.8.2,將其解壓到/opt目錄下

#unzip storm-0.8.2.zip

修改/opt/storm-0.8.2/conf/storm.yaml, 文件內容以下

########### These MUST be filled in for a storm configuration
 storm.zookeeper.servers:
     - "localhost"
#     - "server2"
# 
 nimbus.host: "localhost"
# 
# 
# ##### These may optionally be filled in:
#    
## List of custom serializations
# topology.kryo.register:
#     - org.mycompany.MyType
#     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
#
## List of custom kryo decorators
# topology.kryo.decorators:
#     - org.mycompany.MyDecorator
#
## Locations of the drpc servers
# drpc.servers:
#     - "server1"
#     - "server2"

## Metrics Consumers
# topology.metrics.consumer.register:
#   - class: "backtype.storm.metrics.LoggingMetricsConsumer"
#     parallelism.hint: 1
#   - class: "org.mycompany.MyMetricsConsumer"
#     parallelism.hint: 1
#     argument:
#       - endpoint: "metrics-collector.mycompany.org"
 java.library.path: "/usr/local/lib:/usr/local/share/java"
 supervisor.slots.ports:
   - 6700
   - 6701

注意:

  yaml要求配置項必須以空格打頭

修改storm腳本,將#!/usr/bin/python改成#!/usr/bin/python2, /usr/bin/python是指向python3的因此須要顯示將其改成python2

準備運行cluster模式了

#/opt/storm-0.8.2/bin/storm nimbus
#/opt/storm-0.8.2/bin/storm supervisor
#/opt/storm-0.8.2/bin/storm ui

上述每條指令須要單獨運行在一個終端,若是ui啓動成功,可使用瀏覽器來訪問localhost:8080查看整個cluster的情況了。

部署Topology到cluster

#./storm jar $HOME/working/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-standalone.jar storm.starter.ExclamationTopology exclamationTopology

一切順利的話,應該能夠看到相似的輸出

0    [main] INFO  backtype.storm.StormSubmitter  - Jar not uploaded to master yet. Submitting jar...
91   [main] INFO  backtype.storm.StormSubmitter  - Uploading topology jar /root/working/storm-starter/target/storm-starter-0.0.1-SNAPSHOT-standalone.jar to assigned location: storm-local/nimbus/inbox/stormjar-c73d28f0-68fc-4e6e-98b5-c4d1355aa94f.jar
667  [main] INFO  backtype.storm.StormSubmitter  - Successfully uploaded topology jar to assigned location: storm-local/nimbus/inbox/stormjar-c73d28f0-68fc-4e6e-98b5-c4d1355aa94f.jar
670  [main] INFO  backtype.storm.StormSubmitter  - Submitting topology exclamationTopology in distributed mode with conf {"topology.workers":3,"topology.debug":true}
2449 [main] INFO  backtype.storm.StormSubmitter  - Finished submitting topology: exclamationTopology

自動化運行storm cluster

每次都要手工啓動storm cluster並非一件很使人愉快的事,最好是能自動啓動。解決辦法老是有的,使用python supervisor便可。

#pacman -S supervisor
#mkdir -p /var/log/storm

修改supervisor配置文件,在文件最後添加以下內容

[program:storm-nimbus]
environment=JAVA_HOME=/opt/java, PATH="/usr/sbin:/usr/bin:/usr/local/bin:/opt/java/bin"
command=/opt/storm-0.8.2/bin/storm nimbus
;;user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/nimbus.out
logfile_maxbytes=20MB
logfile_backups=10

[program:storm-supervisor]
environment=JAVA_HOME=/opt/java, PATH="/usr/sbin:/usr/bin:/usr/local/bin:/opt/java/bin"
command=/opt/storm-0.8.2/bin/storm supervisor
;;user=storm
autostart=true
autorestart=true
startsecs=10
startretries=999
log_stdout=true
log_stderr=true
logfile=/var/log/storm/supervisor.out
logfile_maxbytes=20MB
logfile_backups=10

注:

    在上述配置中顯示加入了environment一行,主要是爲了解決可執行文件搜索路徑的問題,不然會報錯說沒法找到java可執行程序因其不在標準路徑/usr/bin, /usr/sbin, /usr/local/bin, /usr/local/sbin中。

啓動supervisord

#systemctl start supervisord

想開機自動運行supervisord的話,執行以下指令

#systemctl enable supervisord

 參考資料

  1. Running a Multi-Node Storm Cluster http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/

相關文章
相關標籤/搜索