新的impala已經支持udf了,在測試環境部署了1.2.3版本的cluster.
在運行測試udf時遇到下面這個錯誤:
java.lang.IllegalArgumentException (代表向方法傳遞了一個不合法或不正確的參數。)
通過確認這是一個bug:
https://issues.cloudera.org/browse/IMPALA-791
The currently impala 1.2.3 doesn't support String as the input and return types. You'll instead have to use Text or BytesWritable.
1.2.3版本的impala udf的輸入參數和返回值還不支持String,能夠使用import org.apache.hadoop.io.Text類代替String
Text的api文檔:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Text.html
重要的幾點:
Constructor:
Text(String string) Construct from a string.
Method:
String toString() Convert text back to string
void set(String string) Set to contain the contents of a string.
void set(Text other) copy a text.
void clear() clear the string to empty
在eclipse中測試Text類的用法:html
package com.hive.myudf; import java.util.Arrays; import java.util.regex.Pattern; import java.util.regex.Matcher; import org.apache.hadoop.io.Text; public class TextTest { private static Text schemal = new Text( "http://"); private static Text t = new Text( "GET /vips-mobile/router.do?api_key=04e0dd9c76902b1bfc5c7b3bb4b1db92&app_version=1.8.7 HTTP/1.0"); private static Pattern p = null; private static Matcher m = null; public static void main(String[] args) { p = Pattern. compile( "(.+?) +(.+?) (.+)"); Matcher m = p.matcher( t.toString()); if (m.matches()){ String tt = schemal +"test.test.com" +m.group(2); System. out .println(tt); //return m.group(2); } else { System. out .println("not match" ); //return null; } schemal .clear(); t.clear(); } }
測試udf:java
package com.hive.myudf; import java.net.URL; import java.util.regex.Matcher; import java.util.regex.Pattern; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; import org.apache.log4j.Logger; public class UDFNginxParseUrl extends UDF { private static final Logger LOG = Logger.getLogger(UDFNginxParseUrl.class); private Text schemal = new Text("http://" ); private Pattern p1 = null; private URL url = null; private Pattern p = null; private Text lastKey = null ; private String rt; public UDFNginxParseUrl() { } public Text evaluate(Text host1, Text urlStr, Text partToExtract) { LOG.debug( "3args|args1:" + host1 +",args2:" + urlStr + ",args3:" + partToExtract); System. out.println("3 args" ); System. out.println("args1:" + host1 +",args2:" + urlStr + ",args3:" + partToExtract); if (host1 == null || urlStr == null || partToExtract == null) { //return null; return new Text("a" ); } p1 = Pattern.compile("(.+?) +(.+?) (.+)" ); Matcher m1 = p1.matcher(urlStr.toString()); if (m1.matches()){ LOG.debug("into match" ); String realUrl = schemal.toString() + host1.toString() + m1.group(2); Text realUrl1 = new Text(realUrl); System. out.println("URL is " + realUrl1); LOG.debug("realurl:" + realUrl1.toString()); try{ LOG.debug("into try" ); url = new URL(realUrl1.toString()); } catch (Exception e){ //return null; LOG.debug("into exception" ); return new Text("b" ); } } if (partToExtract.equals( "HOST")) { rt = url.getHost(); LOG.debug( "get host" + rt ); } //return new Text(rt); LOG.debug( "get what"); return new Text("rt" ); } }
幾個注意的地方:
1.function是和db相關聯的。
2.jar文件存放在hdfs中
3.function會被catalog緩存
apache