Drill是Apache旗下的一個開源SQL查詢引擎,可用於探索大數據。它的設計初衷是爲了支持對大數據的高性能分析,同時支持行業標準查詢語言ANSI SQL。html
在Drill 1.13以前,Drill只支持獨立集羣部署,部署成功後每一個節點上會運行一個名爲Dirllbit的守護進程。從1.13版本開始,Drill支持與YARN集成來管理資源。使用YARN後,Drill將成爲一個運行在YARN上的長進程。當您啓動Drill時,YARN會自動將Drill軟件部署到每一個節點上,避免了在每一個節點上安裝Drill的繁瑣。除此以外,資源管理也會獲得簡化,由於YARN對於Drill使用的資源是敏感的。java
目前全部YARN發行版都提供了內存和CPU(YARN稱爲「vcores」)的設置,某些發行版還提供磁盤的設置。對於內存,在把Drill部署在YARN上的時候,你會配置Drill要使用的內存,然告知YARN。此外,Drill將使用全部可用的磁盤和CPU,固然能夠啓用Linux cgroup來限制Drill對CPU使用的,以到匹配YARN的vcores分配。node
爲了方便講解在YARN下部署Drill,先簡單介紹YARN的核心概念。web
YARN全稱是Yet Another Resource Negotiator(另外一種資源協調者),是一種新的Hadoop資源管理器,它是一個通用資源管理系統,可爲上層應用提供統一的資源管理和調度。apache
當用戶向YARN中提交一個任務後,YARN將分兩個階段運行該任務:第一階段是啓動AM。第二階段是由AM建立任務,爲它申請資源,並監控它的整個運行過程,直到運行完成。具體以下:json
// 初始化並啓動一個YarnClient Configuration yarnConfig = new YarnConfiguration(getConf()); YarnClient client = YarnClient.createYarnClient(); client.init(yarnConfig); client.start(); ...
// 建立一個應用程序 YarnClientApplication app = client.createApplication(); GetNewApplicationResponse appResponse = app.getNewApplicationResponse(); ...
// 設置應用程序提交上下文 ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext(); appContext.setApplicationId(appResponse.getApplicationId()); appContext.setApplicationName(config.getProperty("app.name")); appContext.setApplicationType(config.getProperty("app.type")); ... // 設置am container啓動上下文 ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class); amContainer.setLocalResources(amLocalResources); amContainer.setEnvironment(amEnvironment); amContainer.setCommands(Collections.singletonList(amCommand.toString())); ...
// 提交應用程序 client.submitApplication(appContext);
// 初始化AMRMClientAsync YarnConfiguration yarnConfig = new YarnConfiguration(); AMRMClientAsync amrmClientAsync = AMRMClientAsync.createAMRMClientAsync(5000, new AMRMCallbackHandler()); amrmClientAsync.init(yarnConfig); amrmClientAsync.start();
// 初始化NMClientAsync YarnConfiguration yarnConfig = new YarnConfiguration(); NMClientAsync nmClientAsync = NMClientAsync.createNMClientAsync(new NMCallbackHandler()); nmClientAsync.init(yarnConfig); nmClientAsync.start();
// 註冊ApplicationMaster(AM) amrmClientAsync.registerApplicationMaster(thisHostName, 0, ""); ...
// 添加ContainerRequest amrmClientAsync.addContainerRequest(containerRequest); ...
// 啓動容器 nmClientAsync.startContainerAsync(container, containerContext); ...
// 註銷 amrmClientAsync.unregisterApplicationMaster(appStatus, appMessage, null);
這裏只是簡單介紹了YARN的概念,以及如何編寫YARN應用程序,詳情能夠參考Apache Hadoop YARN安全
YARN經過客戶端來啓動應用程序。對於Drill來講,就是Drill-on-YARN客戶端了。客戶端能夠在任何機器上,只要該機器同時有Drill和Hadoop軟件。使用YARN部署Drill時,您只須要在客戶端計算機上安裝Drill,Drill-on-YARN會自動部署到其它節點。須要注意的是,當你不使用YARN部署Drill時,通常會將其配置文件和自定義代碼放在Drill的目錄中,可是在YARN下運行時,建議全部配置和自定義代碼都會放在一個名爲site的目錄中,勿改變Drill目錄中的任何內容。bash
接下來詳細說明部署步驟:app
部署的環境運維
關於jdk、zookeeper、hadoop的部署,不作贅述了,記得設置JAVA_HOME、HADOOP_HOME
建立一個目錄,用於放置下載好的Drill發行包
export DRILL_DIR=/path/to/drill mkdir -p $DRILL_DIR cd $DRILL_DIR
說明:執行完上面的命令後,所處的目錄爲/path/to/drill
下載Drill發行包,這裏使用apache-drill-1.14.0.tar.gz,下載完後解壓,再次強調,目前所處的目錄爲/path/to/drill
export DRILL_NAME=apache-drill-1.14.0 tar -xzf $DRILL_NAME.tar.gz export DRILL_HOME=$DRILL_DIR/$DRILL_NAME
說明:DRILL_NAME很重要,後面啓動的時候和名字有關係
建立site目錄,並把配置文件和自定義代碼放置在其中
export DRILL_SITE=$DRILL_DIR/site mkdir -p $DRILL_SITE cp $DRILL_HOME/conf/drill-override-example.conf $DRILL_SITE/drill-override.conf cp $DRILL_HOME/conf/drill-on-yarn-example.conf $DRILL_SITE/drill-on-yarn.conf cp $DRILL_HOME/conf/drillenv.sh $DRILL_SITE
說明:
修改$DRILL_SITE/drill-override.conf 通常狀況下,可能須要修改的配置有:cluster-id、zk、http、rpc。此處,我只修改cluster-id和zk
drill.exec: { cluster-id: "drillbits1" zk: { connect: "11.167.47.76:2181,11.167.57.229:2181,11.167.67.151:2181", root: "drill", refresh: 500, timeout: 5000, retry: { count: 7200, delay: 500 } } }
修改$DRILL_SITE/drill-on-yarn.conf
# Drillbit資源配置 drillbit: { heap: "4G" # Java heap size max-direct-memory: "8G" memory-mb: 12288 # 單位MB,container使用的內存,通常來講等於heap+max-direct-memory,可是建議大於這個值 vcores: 4 # cpu個數 } # Drillbit集羣組配置 cluster: [ { name: "mypool" type: "basic" # 可選的有basic和labeled,basic表示在YARN集羣上任意可用的container上啓動drillbits;labeled在一組特定labeled的容器中啓動drillbits count: 1 # 啓動的YARN容器個數 } ] # 配置drill發行包所在的位置 drill-install: { client-path: "/path/to/drill/apache-drill-1.14.0.tar.gz" # dir-name: "drill" } # 設置分佈式文件系統位置 dfs: { connection: "hdfs://ip:port/" dir: "/user/drill" } # Drill-on-YARN Web界面配置 drill.yarn:{ http: { port: 8048 } } # Drill-on-YARN Web界面安全配置 drill.yarn.http: { auth-type: "simple" user-name: "drill" // 注意,drill-on-yan-example.conf默認是user_name,這是錯誤的,要改爲user-name password: "drill" }
說明:
附上完整的配置
drill.yarn: { app-name: "Drill-on-YARN" dfs: { connection: "hdfs://11.162.91.196:9000/" app-dir: "/users/drill" } yarn: { queue: "default" } drill-install: { client-path: "/home/admin/drill/apache-drill-1.14.0.tar.gz" # dir-name: "drill" # library-path: "/opt/libs" } am: { heap: "450M" memory-mb: 512 # node-label-expr: "drill-am" } http: { port: 8048 # ssl-enabled: true auth-type: "simple" user-ame: "drill" password: "drill" rest-key="" } drillbit: { heap: "3G" max-direct-memory: "1G" code-cache: "1G" memory-mb: 4096 vcores: 2 # disks: 3 classpath: "" } cluster: [ { name: "drill-group1" type: "basic" count: 3 } ] }
啓動
$DRILL_HOME/bin/drill-on-yarn.sh --site $DRILL_SITE start
接下來,就會看到啓動日誌
Connecting to DFS... Connected. Using existing Drill archive in DFS: /users/drill/apache-drill-1.14.0.tar.gz Uploading site directory /home/admin/drill/apache-drill-1.14.0/bin/../../site to /users/drill/site.tar.gz ... Uploaded. Loading YARN Config... Loaded. Application ID: application_1533475543014_0005 Launching Drill-on-YARN....................... Tracking URL: http://dtshow011162091196.zth:8088/proxy/application_1533475543014_0005/ Application Master URL: http://11.163.210.105:8048/
從上面的命令能夠看到,會首先把apache-drill-1.14.0.tar.gz和site目錄打成的site.tar.gz上傳至HDFS,而後加載YARN的配置,最後啓動Drill
除了啓動命令外,drill-on-yarn.sh還提供了status、stop、resize、clean命令,好比status
Application ID: application_1533475543014_0005 Application State: RUNNING Host: dtshow011163210105.zth/11.163.210.105 Queue: default User: admin Start Time: 2018-08-19 20:51:55 Application Name: Drill-on-YARN Tracking URL: http://dtshow011162091196.zth:8088/proxy/application_1533475543014_0005/ AM State: LIVE Target Drillbit Count: 3 Live Drillbit Count: 3 Unmanaged Drillbit Count: 0 Blacklisted Node Count: 0 Free Node Count: 0 For more information, visit: http://11.163.210.105:8048/
啓動成功後,即可以訪問http://11.163.210.105:8048/,效果以下圖:
用戶名和密碼就是以前配置的drill、drill,除此以外,此頁面提供了以下功能:
至此,你已經成功的把Drill部署在YARN上面了,一樣能夠經過訪問Drill的Web UI來執行查詢測試,效果以下圖:
相對於獨立集羣部署,Drill-on-YARN簡化了Drill的部署,此外也容易升級和對新功能測試。其次,YARN做爲資源協調者,也相對簡化了Drill的資源管理,由於在啓動Drill時,YARN已經知道Drill可能會使用的資源,後續有其餘任務提交到YARN集羣時,會對Drill的這部分資源比較敏感,防止過分分配給其餘的任務。