hive job sql 優化之CPU佔有太高

時間 2020-04-18

標籤 hive job sql 優化 cpu 佔有太高欄目 Hadoop 简体版

原文原文鏈接

最近有個SQL運行時長超過兩個小時，因此準備優化下html

首先查看hive sql 產生job的counter數據發現java

總的CPU time spent 太高估計100.4319973小時正則表達式

每一個map的CPU time spentsql

排第一的耗了2.0540889小時apache

建議設置以下參數：ide

1、mapreduce.input.fileinputformat.split.maxsize如今是256000000 往下調增長map數（此招立竿見影，我設爲32000000產生了500+的map，最後任務由原先的2小時提速到47分鐘就完成）oop

2、優化UDF getPageID getSiteId getPageValue （這幾個方法用了不少正則表達式的文本匹配）優化

2.1 正則表達式處理優化能夠參考lua

http://www.fasterj.com/articles/regex1.shtmlurl

http://www.fasterj.com/articles/regex2.shtml

2.2 UDF優化見

1 Also you should use class level privatete members to save on object
 incantation and garbage collection.

2 You also get benefits by matching the args with what you would normally
 expect from upstream. Hive converts text to string when needed, but if the
 data normally coming into the method is text you could try and match the
 argument and see if it is any faster.
 Exapmle：
 優化前：
 >>>> import org.apache.hadoop.hive.ql.exec.UDF;
>>>> import java.net.URLDecoder;
>>>>
>>>> public final class urldecode extends UDF {
>>>>
>>>>    public String evaluate(final String s) {
>>>>        if (s == null) { return null; }
>>>>        return getString(s);
>>>>    }
>>>>
>>>>    public static String getString(String s) {
>>>>        String a;
>>>>        try {
>>>>            a = URLDecoder.decode(s);
>>>>        } catch ( Exception e) {
>>>>            a = "";
>>>>        }
>>>>        return a;
>>>>    }
>>>>
>>>>    public static void main(String args[]) {
>>>>        String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
>>>>        System.out.println( getString(t) );
>>>>    }
>>>> }

優化後：

import java.net.URLDecoder;

public final class urldecode extends UDF {

    private Text t = new Text();

    public Text evaluate(Text s) {
        if (s == null) { return null; }
        try {
            t.set( URLDecoder.decode( s.toString(), "UTF-8" ));
            return t;
        } catch ( Exception e) {
            return null;
        }
    }

    //public static void main(String args[]) {
        //String t = "%E5%A4%AA%E5%8E%9F-%E4%B8%89%E4%BA%9A";
        //System.out.println( getString(t) );
    //}
}