在GitHub上有豐富的Hive UDF資源,可針對項目須要進行拉取使用。java
如下記錄hive-third-functions的部署和使用node
安裝依賴的jdo2-api-2.3-ec.jargit
[root@server01 hive_third_jars]# wget http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar -O ~/jdo2-api-2.3-ec.jar --2018-12-28 11:19:48-- http://www.datanucleus.org/downloads/maven2/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar Resolving www.datanucleus.org (www.datanucleus.org)... 80.86.85.8 Connecting to www.datanucleus.org (www.datanucleus.org)|80.86.85.8|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 198552 (194K) [application/java-archive] Saving to: ‘/root/jdo2-api-2.3-ec.jar’ 100%[==============================================================================================>] 198,552 133KB/s in 1.5s 2018-12-28 11:19:52 (133 KB/s) - ‘/root/jdo2-api-2.3-ec.jar’ saved [198552/198552] [root@server01 hive_third_jars]# mvn install:install-file -DgroupId=javax.jdo -DartifactId=jdo2-api -Dversion=2.3-ec -Dpackaging=jar -Dfile=jdo2-api-2.3-ec.jar [INFO] Scanning for projects... [INFO] [INFO] ------------------< org.apache.maven:standalone-pom >------------------- [INFO] Building Maven Stub Project (No POM) 1 [INFO] --------------------------------[ pom ]--------------------------------- [INFO] [INFO] --- maven-install-plugin:2.4:install-file (default-cli) @ standalone-pom --- [INFO] Installing /root/hive_third_jars/jdo2-api-2.3-ec.jar to /root/.m2/repository/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.jar [INFO] Installing /tmp/mvninstall1123492988314226896.pom to /root/.m2/repository/javax/jdo/jdo2-api/2.3-ec/jdo2-api-2.3-ec.pom [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 0.394 s [INFO] Finished at: 2018-12-28T11:23:25+08:00 [INFO] ------------------------------------------------------------------------ [root@server01 hive_third_jars]# ls jdo2-api-2.3-ec.jar
MVN Packagegithub
[root@server01 hive_third_jars]# cd hive-third-functions/ [root@server01 hive-third-functions]# ls pom.xml README-geo.md README.md README-zh.md src [root@server01 hive-third-functions]# mvn clean package -DskipTests [INFO] Scanning for projects... [INFO] [INFO] ----------------< cc.shanruifeng:hive-third-functions >----------------- ...... [INFO] Attaching shaded artifact. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:28 min [INFO] Finished at: 2018-12-28T11:28:41+08:00 [INFO] ------------------------------------------------------------------------ [root@server01 hive-third-functions]# ls pom.xml README-geo.md README.md README-zh.md src target [root@server01 hive-third-functions]# ls target/ classes generated-test-sources hive-third-functions-2.1.3-shaded.jar maven-status generated-sources hive-third-functions-2.1.3.jar maven-archiver test-classes
添加jar包至hivesql
0: jdbc:hive2://node222:10000/demo> add jar /usr/local/hive-2.1.1/datas/hive-third-functions-2.1.3-shaded.jar ; No rows affected (0.031 seconds) 0: jdbc:hive2://node222:10000/demo> list jar; +--------------------------------------------------------------------+--+ | resource | +--------------------------------------------------------------------+--+ | /usr/local/hive-2.1.1/datas/hive-third-functions-2.1.3-shaded.jar | +--------------------------------------------------------------------+--+ 1 row selected (0.014 seconds)
建立臨時函數關聯函數classapache
0: jdbc:hive2://node222:10000/demo> create temporary function array_contains as 'cc.shanruifeng.functions.array.UDFArrayContains'; No rows affected (0.04 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_equals as 'cc.shanruifeng.functions.array.UDFArrayEquals'; No rows affected (0.032 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_intersect as 'cc.shanruifeng.functions.array.UDFArrayIntersect'; No rows affected (0.039 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_max as 'cc.shanruifeng.functions.array.UDFArrayMax'; No rows affected (0.02 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_min as 'cc.shanruifeng.functions.array.UDFArrayMin'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_join as 'cc.shanruifeng.functions.array.UDFArrayJoin'; No rows affected (0.022 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_distinct as 'cc.shanruifeng.functions.array.UDFArrayDistinct'; No rows affected (0.022 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_position as 'cc.shanruifeng.functions.array.UDFArrayPosition'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_remove as 'cc.shanruifeng.functions.array.UDFArrayRemove'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_reverse as 'cc.shanruifeng.functions.array.UDFArrayReverse'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_sort as 'cc.shanruifeng.functions.array.UDFArraySort'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_concat as 'cc.shanruifeng.functions.array.UDFArrayConcat'; No rows affected (0.022 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_value_count as 'cc.shanruifeng.functions.array.UDFArrayValueCount'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_slice as 'cc.shanruifeng.functions.array.UDFArraySlice'; No rows affected (0.02 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function array_element_at as 'cc.shanruifeng.functions.array.UDFArrayElementAt'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function bit_count as 'cc.shanruifeng.functions.bitwise.UDFBitCount'; No rows affected (0.02 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_and as 'cc.shanruifeng.functions.bitwise.UDFBitwiseAnd'; No rows affected (0.02 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_not as 'cc.shanruifeng.functions.bitwise.UDFBitwiseNot'; No rows affected (0.023 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_or as 'cc.shanruifeng.functions.bitwise.UDFBitwiseOr'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function bitwise_xor as 'cc.shanruifeng.functions.bitwise.UDFBitwiseXor'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function map_build as 'cc.shanruifeng.functions.map.UDFMapBuild'; No rows affected (0.022 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function map_concat as 'cc.shanruifeng.functions.map.UDFMapConcat'; No rows affected (0.024 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function map_element_at as 'cc.shanruifeng.functions.map.UDFMapElementAt'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function map_equals as 'cc.shanruifeng.functions.map.UDFMapEquals'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function day_of_week as 'cc.shanruifeng.functions.date.UDFDayOfWeek'; No rows affected (0.034 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function day_of_year as 'cc.shanruifeng.functions.date.UDFDayOfYear'; No rows affected (0.017 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function type_of_day as 'cc.shanruifeng.functions.date.UDFTypeOfDay'; No rows affected (0.028 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function zodiac_cn as 'cc.shanruifeng.functions.date.UDFZodiacSignCn'; No rows affected (0.016 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function zodiac_en as 'cc.shanruifeng.functions.date.UDFZodiacSignEn'; No rows affected (0.018 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function pinyin as 'cc.shanruifeng.functions.string.UDFChineseToPinYin'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function md5 as 'cc.shanruifeng.functions.string.UDFMd5'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function sha256 as 'cc.shanruifeng.functions.string.UDFSha256'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://node222:10000/demo> create temporary function json_array_get as 'cc.shanruifeng.functions.json.UDFJsonArrayGet'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function json_array_length as 'cc.shanruifeng.functions.json.UDFJsonArrayLength'; No rows affected (0.019 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function json_array_extract as 'cc.shanruifeng.functions.json.UDFJsonArrayExtract'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function json_array_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonArrayExtractScalar'; No rows affected (0.022 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function json_extract as 'cc.shanruifeng.functions.json.UDFJsonExtract'; No rows affected (0.021 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function json_extract_scalar as 'cc.shanruifeng.functions.json.UDFJsonExtractScalar'; No rows affected (0.016 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function json_size as 'cc.shanruifeng.functions.json.UDFJsonSize'; No rows affected (0.017 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function id_card_province as 'cc.shanruifeng.functions.card.UDFChinaIdCardProvince'; No rows affected (0.017 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function id_card_city as 'cc.shanruifeng.functions.card.UDFChinaIdCardCity'; No rows affected (0.016 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function id_card_area as 'cc.shanruifeng.functions.card.UDFChinaIdCardArea'; No rows affected (0.016 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function id_card_birthday as 'cc.shanruifeng.functions.card.UDFChinaIdCardBirthday'; No rows affected (0.016 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function id_card_gender as 'cc.shanruifeng.functions.card.UDFChinaIdCardGender'; No rows affected (0.015 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function is_valid_id_card as 'cc.shanruifeng.functions.card.UDFChinaIdCardValid'; No rows affected (0.013 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function id_card_info as 'cc.shanruifeng.functions.card.UDFChinaIdCardInfo'; No rows affected (0.014 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function wgs_distance as 'cc.shanruifeng.functions.geo.UDFGeoWgsDistance'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://node222:10000/demo> create temporary function gcj_to_bd as 'cc.shanruifeng.functions.geo.UDFGeoGcjToBd'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://node222:10000/demo> create temporary function bd_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoBdToGcj'; No rows affected (0.014 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function wgs_to_gcj as 'cc.shanruifeng.functions.geo.UDFGeoWgsToGcj'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://node222:10000/demo> create temporary function gcj_to_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjToWgs'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://node222:10000/demo> create temporary function gcj_extract_wgs as 'cc.shanruifeng.functions.geo.UDFGeoGcjExtractWgs'; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) 0: jdbc:hive2://node222:10000/demo> create temporary function url_encode as 'cc.shanruifeng.functions.url.UDFUrlEncode'; No rows affected (0.015 seconds) 0: jdbc:hive2://node222:10000/demo> create temporary function url_decode as 'cc.shanruifeng.functions.url.UDFUrlDecode'; No rows affected (0.041 seconds)
查看所建立的函數json
0: jdbc:hive2://node222:10000/demo> describe function zodiac_cn; +--------------------------------------------------------------------------------------------------------------------+--+ | tab_name | +--------------------------------------------------------------------------------------------------------------------+--+ | zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac. | +--------------------------------------------------------------------------------------------------------------------+--+ 1 row selected (0.041 seconds) 0: jdbc:hive2://node222:10000/demo> describe function extended zodiac_cn; +--------------------------------------------------------------------------------------------------------------------+--+ | tab_name | +--------------------------------------------------------------------------------------------------------------------+--+ | zodiac_cn(date) - from the input date string or separate month and day arguments, returns the sing of the Zodiac. | | Example: | | > select zodiac_cn(date_string) from src; | | > select zodiac_cn(month, day) from src; | +--------------------------------------------------------------------------------------------------------------------+--+ 4 rows selected (0.029 seconds)
測試所定義的函數api
0: jdbc:hive2://node222:10000/demo> select pinyin('中國'); +-----------+--+ | _c0 | +-----------+--+ | zhongguo | +-----------+--+ 1 row selected (0.577 seconds) 0: jdbc:hive2://node222:10000/demo> select md5('aaronshan'); +-----------------------------------+--+ | _c0 | +-----------------------------------+--+ | 95686bc0483262afe170b550dd4544d1 | +-----------------------------------+--+ 1 row selected (0.117 seconds) 0: jdbc:hive2://node222:10000/demo> select day_of_week('2016-07-12'); +------+--+ | _c0 | +------+--+ | 2 | +------+--+ 1 row selected (0.126 seconds) 0: jdbc:hive2://node222:10000/demo> select day_of_week('2018-12-28'); +------+--+ | _c0 | +------+--+ | 5 | +------+--+ 1 row selected (0.108 seconds) 0: jdbc:hive2://node222:10000/demo> select zodiac_cn('1989-01-08'); +------+--+ | _c0 | +------+--+ | 魔羯座 | +------+--+ 1 row selected (0.129 seconds)
function | description |
---|---|
pinyin(string) -> string | convert chinese to pinyin |
md5(string) -> string | md5 hash |
sha256(string) -> string | sha256 hashbash |
function | description |
---|---|
array_contains(array<E>, E) -> boolean | whether array contains value or not. |
array_equals(array<E>, array<E>) -> boolean | whether two array equals or not. |
array_intersect(array, array) -> array | returns the two array's intersection, without duplicates. |
array_max(array<E>) -> E | returns the maximum value of input array. |
array_min(array<E>) -> E | returns the minimum value of input array. |
array_join(array, delimiter, null_replacement) -> string | concatenates the elements of the given array using the delimiter and an optional null_replacement to replace nulls. |
array_distinct(array) -> array | remove duplicate values from the array. |
array_position(array<E>, E) -> long | returns the position of the first occurrence of the element in array (or 0 if not found). |
array_remove(array<E>, E) -> array | remove all elements that equal element from array. |
array_reverse(array) -> array | reverse the array element. |
array_sort(array) -> array | sorts and returns the array. The elements of array must be orderable. |
array_concat(array, array) -> array | concatenates two arrays. |
array_value_count(array<E>, E) -> long | count array's element number that element value equals given value. |
array_slice(array, start, length) -> array | subsets array starting from index start (or starting from the end if start is negative) with a length of length. |
array_element_at(array<E>, index) -> E | returns element of array at given index. If index < 0, element_at accesses elements from the last to the first. |
function | description |
---|---|
map_build(x<K>, y<V>) -> map<K, V> | returns a map created using the given key/value arrays. |
map_concat(x<K, V>, y<K, V>) -> map<K,V> | returns the union of two maps. If a key is found in both x and y , that key’s value in the resulting map comes from y . |
map_element_at(map<K, V>, key) -> V | returns value for given key , or NULL if the key is not contained in the map. |
map_equals(x<K, V>, y<K, V>) -> boolean | whether map x equals with map y or not. |
function | description |
---|---|
day_of_week(date_string | date) -> int | day of week,if monday,return 1, sunday return 7, error return null. |
day_of_year(date_string | date) -> int | day of year. The value ranges from 1 to 366. |
zodiac_en(date_string | date) -> string | convert date to zodiac |
zodiac_cn(date_string | date) -> string | convert date to zodiac chinese |
type_of_day(date_string | date) -> string | for chinese. 獲取日期的類型(1: 法定節假日, 2: 正常週末, 3: 正常工做日 4:攢假的工做日),錯誤返回-1. |
function | description |
---|---|
json_array_get(json, jsonPath) -> array(varchar) | returns the element at the specified index into the json_array . The index is zero-based. |
json_array_length(json, jsonPath) -> array(varchar) | returns the array length of json (a string containing a JSON array). |
json_array_extract(json, jsonPath) -> array(varchar) | extract json array by given jsonPath. |
json_array_extract_scalar(json, jsonPath) -> array(varchar) | like json_array_extract , but returns the result value as a string (as opposed to being encoded as JSON). |
json_extract(json, jsonPath) -> array(varchar) | extract json by given jsonPath. |
json_extract_scalar(json, jsonPath) -> array(varchar) | like json_extract , but returns the result value as a string (as opposed to being encoded as JSON). |
json_size(json, jsonPath) -> array(varchar) | like json_extract , but returns the size of the value. For objects or arrays, the size is the number of members, and the size of a scalar value is zero. |
function | description |
---|---|
bit_count(x, bits) -> bigint | count the number of bits set in x (treated as bits-bit signed integer) in 2’s complement representation |
bitwise_and(x, y) -> bigint | returns the bitwise AND of x and y in 2’s complement arithmetic. |
bitwise_not(x) -> bigint | returns the bitwise NOT of x in 2’s complement arithmetic. |
bitwise_or(x, y) -> bigint | returns the bitwise OR of x and y in 2’s complement arithmetic. |
bitwise_xor(x, y) -> bigint | returns the bitwise XOR of x and y in 2’s complement arithmetic. |
function | description |
---|---|
id_card_province(string) -> string | get user's province |
id_card_city(string) -> string | get user's city |
id_card_area(string) -> string | get user's area |
id_card_birthday(string) -> string | get user's birthday |
id_card_gender(string) -> string | get user's gender |
is_valid_id_card(string) -> boolean | determine is valid china id card No. |
id_card_info(string) -> json | get china id card info. include province, city, area etc. |
function | description |
---|---|
wgs_distance(double lat1, double lng1, double lat2, double lng2) -> double | calculate WGS84 coordinate distance, in meters. |
gcj_to_bd(double,double) -> json | GCJ-02(火星座標系) convert to BD-09(百度座標系), 谷歌、高德——>百度 |
bd_to_gcj(double,double) -> json | BD-09(百度座標系) convert to GCJ-02(火星座標系), 百度——>谷歌、高德 |
wgs_to_gcj(double,double) -> json | WGS84(地球座標系) convert to GCJ02(火星座標系) |
gcj_to_wgs(double,double) -> json | GCJ02(火星座標系) convert to GPS84(地球座標系), output coordinate WGS-84 accuracy within 1 to 2 meters. |
gcj_extract_wgs(double,double) -> json | GCJ02(火星座標系) convert to GPS84, output coordinate WGS-84 accuracy within 0.5 meters. but compute cost more time than gcj_to_wgs . |
關於互聯網地圖座標系的說明見: 當前互聯網地圖的座標系現狀app
function | description |
---|---|
url_encode(value) -> string | escapes value by encoding it so that it can be safely included in URL query parameter names and values |
url_decode(value) -> string | unescape the URL encoded value. This function is the inverse of url_encode . |
參考資料:
https://github.com/aaronshan/hive-third-functions