YARN 多租戶資源池配置


簡介:node

YARN 多租戶資源池配置算法

當多用戶同在一個 hadoop 集羣做業時,就須要對資源進行有效的限制,例如區分測試、正式資源等shell

1、查看默認資源池apache

# 訪問:http://192.168.1.25:8088/cluster/scheduler 即 master.hadoopvim

# 能夠看到默認的資源池 default,這裏稱爲隊列,當有用戶提交任務時,就會使用 default 資源池中的資源併發

2、配置資源池app

hadoop shell > vim etc/hadoop/yarn-site.xml  # YARN 配置文件

<configuration>

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master.hadoop</value>
    </property>

    <property>
        <name>yarn.acl.enable</name>
        <value>true</value>
    </property>

    <property>
      <name>yarn.resourcemanager.scheduler.class</name>
      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>

    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>${yarn.log.dir}/userlogs</value>
    </property>

    <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

</configuration>

# yarn.acl.enable 開啓 ACL 權限認證
# 這裏選用的是計算能力調度算法 CapacityScheduler ide

hadoop shell > vim etc/hadoop/capacity-scheduler.xml  # 子配置文件,主要配置資源池相關參數

<configuration>

    <property>
        <name>yarn.scheduler.capacity.maximum-applications</name>
        <value>10000</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>0.1</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.resource-calculator</name>
        <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,prod</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>30</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
        <value>1</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>100</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.default.state</name>
        <value>RUNNING</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
        <value>*</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
        <value>*</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.prod.capacity</name>
        <value>70</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.prod.user-limit-factor</name>
        <value>1</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.prod.maximum-capacity</name>
        <value>100</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.prod.state</name>
        <value>RUNNING</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.prod.acl_submit_applications</name>
        <value>wang</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.root.prod.acl_administer_queue</name>
        <value>wang</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.node-locality-delay</name>
        <value>40</value>
    </property>

    <property>
        <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
        <value>false</value>
    </property>

</configuration>

# yarn.scheduler.capacity.maximum-applications 集羣中能夠同時運行或等待的應用數量
# yarn.scheduler.capacity.maximum-am-resource-percent 集羣中能夠運行 application master 的資源比例上限,一般用來限制併發運行的應用程序,默認 10%
# yarn.scheduler.capacity.resource-calculator 資源計算方法,默認只計算內存,DominantResourceCalculator 計算內存、CPU
# yarn.scheduler.capacity.root.queues 定義資源池,default、prod
# yarn.scheduler.capacity.root.<default>.capacity 分別定義資源池佔用總資源的百分比,同級資源池佔用總和必須爲 100%
# yarn.scheduler.capacity.root.<default>.user-limit-factor 每用戶最多佔用資源百分比,默認 100%
# yarn.scheduler.capacity.root.default.maximum-capacity 每資源池使用資源上限,因爲資源共享,會存在資源池使用的資源量會超過其配置的容量
# yarn.scheduler.capacity.root.default.state 資源池狀態,STOPPED \ RUNNING,狀態爲 STOPPED 時,用戶沒法向該隊列或子隊列提交任務
# yarn.scheduler.capacity.root.default.acl_submit_applications 限制用戶、組能夠向隊列提交任務,默認爲 * 全部,該屬性具備繼承性,子隊列會集成府隊列的權限
# yarn.scheduler.capacity.root.default.acl_administer_queue 設置可管理該隊列的用戶、組,例如能夠殺死任意任務等
# yarn.scheduler.capacity.node-locality-delay 調度器嘗試進行調度的次數,-1 爲不啓用,默認 40
# yarn.scheduler.capacity.queue-mappings-override.enable 是否用戶指定的隊列能夠被覆蓋,默認 falseoop

hadoop shell > vim etc/hadoop/mapred-site.xml

<configuration>

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
        <name>mapreduce.cluster.acls.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/tmp/hadoop-yarn/staging</value>
    </property>

</configuration>

3、使配置生效測試

hadoop shell > yarn rmadmin -refreshQueues  # 增長隊列、修改屬性等 能夠執行該指令,刪除隊列須要重啓 YARN

# 如今刷新網頁,就會看到多了一個 prod 的隊列(資源池)

4、驗證資源池

hadoop shell > hadoop jar /usr/local/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar grep shakespeare.txt outfile what

# hadoop 用戶提交任務,進入了 default 隊列

hadoop shell > hdfs dfs -mkdir /user/wang
hadoop shell > hdfs dfs -chown -R wang /user/wang
hadoop shell > hdfs dfs -chmod -R 777 /tmp

wang shell > hdfs dfs -put shakespeare.txt
wang shell > hadoop jar /usr/local/hadoop-2.8.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar grep -Dmapreduce.job.queuename=prod shakespeare.txt outfile what

# 嗯,不指定資源池,默認使用 default ,用戶 Wang 指定能夠指定配置好的資源池,訪問 http://192.168.1.25:8088 也能夠看到,狀態正常

# 尷尬的是,其他用戶也能指定 prod 資源池,而且能夠成功! 說明 ACL 有問題,可是目前還沒解決~~~ 超尷尬!

相關文章
相關標籤/搜索