Apache Hive 執行HQL語句報錯 ( 10G )


# 故障描述:java

hive > select substring(request_body["uuid"], -1, 1) as uuid, count(distinct(request_body["uuid"])) as count 
from log_bftv_api 
where year=2017 and month=11 and day=1 and request_body["method"] = "bv.lau.urecommend" and length(request_body["uuid"]) = 25 
group by 1 
order by uuid;

# hive 執行該HQL語句時報錯信息以下:( 數據量小的時候沒有問題 )

# 報錯信息:node

MapReduce Total cumulative CPU time: 1 minutes 46 seconds 70 msec
Ended Job = job_1510050683827_0137 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1510050683827_0137_m_000002 (and more) from job job_1510050683827_0137

Task with the most failures(4): 
-----
Task ID:
  task_1510050683827_0137_m_000000

URL:
  http://namenode:8088/taskdetails.jsp?jobid=job_1510050683827_0137&tipid=task_1510050683827_0137_m_000000
-----
Diagnostic Messages for this Task:
Error: Java heap space

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 3  Reduce: 5   Cumulative CPU: 106.07 sec   HDFS Read: 223719539 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 1 minutes 46 seconds 70 msec

# 緣由分析:shell

報錯顯示 Error: Java heap space、return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

查資料說是由於內存的緣由,因爲HQL其實是被轉換成mapreduce的java任務,因此作了如下操做。

解決方法:apache

hadoop shell > vim etc/hadoop/hadoop-env.sh

# 默認 1000
export HADOOP_HEAPSIZE=4096

hadoop shell > vim etc/hadoop/yarn-env.sh

# 默認 1000
YARN_HEAPSIZE=4096

# 跟據實際狀況,按需調整!

hadoop shell > vim etc/hadoop/mapred-site.xml

    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>1536</value>
    </property>

    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024M</value>
    </property>

    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>3072</value>
    </property>

    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx2560M</value>
    </property>

    <property>
        <name>mapreduce.task.io.sort.mb</name>
        <value>512</value>
    </property>

    <property>
        <name>mapreduce.task.io.sort.factor</name>
        <value>100</value>
    </property>

    <property>
        <name>mapreduce.reduce.shuffle.parallelcopies</name>
        <value>50</value>
    </property>

# 新增這些參數 ( 跟據機器實際狀況,按需成倍調整 )
# 個人這個測試環境是4臺8核8G的KVM虛擬機,一個NameNode,三個DataNode!# 通過此次參數調整,目前600G的數據集上沒出過問題,HDFS 上還在不斷的寫入歷史數據、新數據。
相關文章
相關標籤/搜索