很少說,直接上乾貨!
html
前期博客java
核心安裝包(Core Tarball)vim
該安裝包包含核心的SDC軟件,使該軟件具備最小的軟件鏈接器集合,固然你能夠手動下載額外的節點(Stage)centos
① 經過Streamsets的UI進行安裝,UI上點擊的位置爲:在該軟件界面的右邊(圖標是一個禮物盒子。。。)。bash
② 也能夠經過使用CLI進行安裝,安裝過程以下所示:微信
第一步、下載該【核心安裝包】,好比版本爲:streamsets-datacollector-core-3.3.0.tgzapp
第二步、解壓該安裝包eclipse
[hadoop@master app]$ tar -zxvf streamsets-datacollector-core-3.3.0.tgz
[hadoop@master streamsets-datacollector-3.3.0]$ ./bin/streamsets dc Java 1.8 detected; adding $SDC_JAVA8_OPTS of "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djdk.nio.maxCachedBufferSize=262144" to $SDC_JAVA_OPTS Configuration of maximum open file limit is too low: 1024 (expected at least 32768). Please consult https://goo.gl/LgvGFl [hadoop@master streamsets-datacollector-3.3.0]$
注:在這個啓動的過程當中會出現啓動報錯的狀況,錯誤提示是:最大的文件數爲1024,而streamsets須要更大的文件數,所以就要必要的設置一下環境了。機器學習
設置方式有兩種:工具
(1)修改配置文件,而後重啓centos永久生效,
(2)經過一個命令進行生效:
ulimit -n 65535 Browse to http://<system-ip>:18630/ The default username and password are 「admin」 and 「admin」.
[hadoop@master streamsets-datacollector-3.3.0]$ pwd /home/hadoop/app/streamsets-datacollector-3.3.0 [hadoop@master streamsets-datacollector-3.3.0]$ ./bin/streamsets dc Java 1.8 detected; adding $SDC_JAVA8_OPTS of "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djdk.nio.maxCachedBufferSize=262144" to $SDC_JAVA_OPTS Logging initialized @6514ms to org.eclipse.jetty.util.log.Slf4jLog Running on URI : 'http://master:18630'
經過這種方式:你就能夠看到正真的streamsets真面目了。。。。後面咱們看看他真面目裏面的一些細節。。。。這個工具主要進行數據移動及數據清洗有很大的幫助。
或者
[hadoop@master streamsets-datacollector-3.3.0]$ pwd /home/hadoop/app/streamsets-datacollector-3.3.0 [hadoop@master streamsets-datacollector-3.3.0]$ nohup /home/hadoop/app/streamsets-datacollector-3.3.0/bin/streamsets dc & [1] 2881 [hadoop@master streamsets-da
也許,你在啓動過程當中,會出現
安裝成功的後續步驟(建議去作):
一、添加sdc用戶的進程操做文件描述符的並行度
[root@master streamsets-datacollector-3.3.0]# vim /etc/security/limits.conf
sdc soft nofile 32768 sdc hard nofile 32768
二、vim /etc/profile
[root@master streamsets-datacollector-3.3.0]# vim /etc/profile
[root@master streamsets-datacollector-3.3.0]# source /etc/profile
三、建立文件目錄,用於放日誌信息
[root@master data]# su hadoop [hadoop@master data]$ pwd /data [hadoop@master data]$ cd /home/hadoop/app/streamsets-datacollector-3.3.0/ [hadoop@master streamsets-datacollector-3.3.0]$ pw bash: pw: command not found [hadoop@master streamsets-datacollector-3.3.0]$ pwd /home/hadoop/app/streamsets-datacollector-3.3.0 [hadoop@master streamsets-datacollector-3.3.0]$ [hadoop@master streamsets-datacollector-3.3.0]$ pwd /home/hadoop/app/streamsets-datacollector-3.3.0 [hadoop@master streamsets-datacollector-3.3.0]$ cd /data/ [hadoop@master data]$ pwd /data [hadoop@master data]$ ll total 4 drwxr-xr-x 3 hadoop hadoop 4096 Jul 27 2017 kafka-log [hadoop@master data]$ mkdir -p /data/streamsets/sdc-stand-alone [hadoop@master data]$ mkdir -p /data/streamsets/sdc-stand-alone-dirs/ [hadoop@master data]$ mkdir -p /data/streamsets/sdc-stand-alone-dirs/configuration [hadoop@master data]$ mkdir -p /data/streamsets/sdc-stand-alone-dirs/data [hadoop@master data]$ mkdir -p /data/streamsets/sdc-stand-alone-dirs/log [hadoop@master data]$ mkdir -p /data/streamsets/sdc-stand-alone-dirs/resource [hadoop@master data]$
修改配置文件
# directory where the data collector will store pipelines and their runtime information # #export SDC_DATA=/var/lib/sdc # directory where the data collector write its logs # #export SDC_LOG=/var/log/sdc # directory where the data collector will read its configuration # #export SDC_CONF=/etc/sdc # directory where the data collector will read pipeline resource files from # #export SDC_RESOURCES=/var/lib/sdc-resources
改成
# directory where the data collector will store pipelines and their runtime information # export SDC_DATA=/data/streamsets/sdc-stand-alone-dirs/data # directory where the data collector write its logs # export SDC_LOG=/data/streamsets/sdc-stand-alone-dirs/log # directory where the data collector will read its configuration # export SDC_CONF=/data/streamsets/sdc-stand-alone-dirs/configuration # directory where the data collector will read pipeline resource files from # export SDC_RESOURCES=/data/streamsets/sdc-stand-alone-dirs/resource
同時,你們能夠關注個人我的博客:
http://www.cnblogs.com/zlslch/ 和 http://www.cnblogs.com/lchzls/ http://www.cnblogs.com/sunnyDream/
詳情請見:http://www.cnblogs.com/zlslch/p/7473861.html
人生苦短,我願分享。本公衆號將秉持活到老學到老學習無休止的交流分享開源精神,匯聚於互聯網和我的學習工做的精華乾貨知識,一切來於互聯網,反饋回互聯網。
目前研究領域:大數據、機器學習、深度學習、人工智能、數據挖掘、數據分析。 語言涉及:Java、Scala、Python、Shell、Linux等 。同時還涉及日常所使用的手機、電腦和互聯網上的使用技巧、問題和實用軟件。 只要你一直關注和呆在羣裏,天天必須有收穫
對應本平臺的討論和答疑QQ羣:大數據和人工智能躺過的坑(總羣)(161156071)