簡介:node
YARN 多租戶資源池配置算法
當多用戶同在一個 hadoop 集羣做業時,就須要對資源進行有效的限制,例如區分測試、正式資源等shell
1、查看默認資源池apache
# 訪問:http://192.168.1.25:8088/cluster/scheduler 即 master.hadoopvim
# 能夠看到默認的資源池 default,這裏稱爲隊列,當有用戶提交任務時,就會使用 default 資源池中的資源併發
2、配置資源池app
hadoop shell > vim etc/hadoop/yarn-site.xml # YARN 配置文件 <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>master.hadoop</value> </property> <property> <name>yarn.acl.enable</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>${yarn.log.dir}/userlogs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
# yarn.acl.enable 開啓 ACL 權限認證
# 這裏選用的是計算能力調度算法 CapacityScheduler ide
hadoop shell > vim etc/hadoop/capacity-scheduler.xml # 子配置文件,主要配置資源池相關參數 <configuration> <property> <name>yarn.scheduler.capacity.maximum-applications</name> <value>10000</value> </property> <property> <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> <value>0.1</value> </property> <property> <name>yarn.scheduler.capacity.resource-calculator</name> <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> </property> <property> <name>yarn.scheduler.capacity.root.queues</name> <value>default,prod</value> </property> <property> <name>yarn.scheduler.capacity.root.default.capacity</name> <value>30</value> </property> <property> <name>yarn.scheduler.capacity.root.default.user-limit-factor</name> <value>1</value> </property> <property> <name>yarn.scheduler.capacity.root.default.maximum-capacity</name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.default.state</name> <value>RUNNING</value> </property> <property> <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.prod.capacity</name> <value>70</value> </property> <property> <name>yarn.scheduler.capacity.root.prod.user-limit-factor</name> <value>1</value> </property> <property> <name>yarn.scheduler.capacity.root.prod.maximum-capacity</name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.prod.state</name> <value>RUNNING</value> </property> <property> <name>yarn.scheduler.capacity.root.prod.acl_submit_applications</name> <value>wang</value> </property> <property> <name>yarn.scheduler.capacity.root.prod.acl_administer_queue</name> <value>wang</value> </property> <property> <name>yarn.scheduler.capacity.node-locality-delay</name> <value>40</value> </property> <property> <name>yarn.scheduler.capacity.queue-mappings-override.enable</name> <value>false</value> </property> </configuration>
# yarn.scheduler.capacity.maximum-applications 集羣中能夠同時運行或等待的應用數量
# yarn.scheduler.capacity.maximum-am-resource-percent 集羣中能夠運行 application master 的資源比例上限,一般用來限制併發運行的應用程序,默認 10%
# yarn.scheduler.capacity.resource-calculator 資源計算方法,默認只計算內存,DominantResourceCalculator 計算內存、CPU
# yarn.scheduler.capacity.root.queues 定義資源池,default、prod
# yarn.scheduler.capacity.root.<default>.capacity 分別定義資源池佔用總資源的百分比,同級資源池佔用總和必須爲 100%
# yarn.scheduler.capacity.root.<default>.user-limit-factor 每用戶最多佔用資源百分比,默認 100%
# yarn.scheduler.capacity.root.default.maximum-capacity 每資源池使用資源上限,因爲資源共享,會存在資源池使用的資源量會超過其配置的容量
# yarn.scheduler.capacity.root.default.state 資源池狀態,STOPPED \ RUNNING,狀態爲 STOPPED 時,用戶沒法向該隊列或子隊列提交任務
# yarn.scheduler.capacity.root.default.acl_submit_applications 限制用戶、組能夠向隊列提交任務,默認爲 * 全部,該屬性具備繼承性,子隊列會集成府隊列的權限
# yarn.scheduler.capacity.root.default.acl_administer_queue 設置可管理該隊列的用戶、組,例如能夠殺死任意任務等
# yarn.scheduler.capacity.node-locality-delay 調度器嘗試進行調度的次數,-1 爲不啓用,默認 40
# yarn.scheduler.capacity.queue-mappings-override.enable 是否用戶指定的隊列能夠被覆蓋,默認 falseoop
hadoop shell > vim etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.cluster.acls.enabled</name> <value>true</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/tmp/hadoop-yarn/staging</value> </property> </configuration>
3、使配置生效測試
hadoop shell > yarn rmadmin -refreshQueues # 增長隊列、修改屬性等 能夠執行該指令,刪除隊列須要重啓 YARN
# 如今刷新網頁,就會看到多了一個 prod 的隊列(資源池)
4、驗證資源池
hadoop shell > hadoop jar /usr/local/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar grep shakespeare.txt outfile what
# hadoop 用戶提交任務,進入了 default 隊列
hadoop shell > hdfs dfs -mkdir /user/wang hadoop shell > hdfs dfs -chown -R wang /user/wang hadoop shell > hdfs dfs -chmod -R 777 /tmp wang shell > hdfs dfs -put shakespeare.txt wang shell > hadoop jar /usr/local/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar grep -Dmapreduce.job.queuename=prod shakespeare.txt outfile what
# 嗯,不指定資源池,默認使用 default ,用戶 Wang 指定能夠指定配置好的資源池,訪問 http://192.168.1.25:8088 也能夠看到,狀態正常
# 尷尬的是,其他用戶也能指定 prod 資源池,而且能夠成功! 說明 ACL 有問題,可是目前還沒解決~~~ 超尷尬!