hive腳本的執行方式大體有三種:
1. hive控制檯
執行;
2. hive -e "SQL"
執行;
3. hive -f SQL文件
執行;正則表達式
參考hive的用法sql
usage: hive -d,--define <key=value> Variable subsitution to apply to hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information -h <hostname> connecting to Hive Server on remote host --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -p <port> connecting to Hive Server on port number -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console)
hive控制檯
執行顧名思義,是進入hive控制檯之後,執行sql腳本,例如:shell
hive> set mapred.job.queue.name=pms; hive> select page_name, tpa_name from pms.pms_exps_prepro limit 2; Total MapReduce jobs = 1 Launching Job 1 out of 1 ... Job running in-process (local Hadoop) 2015-10-23 10:06:47,756 null map = 100%, reduce = 0% 2015-10-23 10:06:48,863 null map = 23%, reduce = 0% 2015-10-23 10:06:49,946 null map = 38%, reduce = 0% 2015-10-23 10:06:51,051 null map = 72%, reduce = 0% 2015-10-23 10:06:52,129 null map = 100%, reduce = 0% Ended Job = job_local1109193547_0001 Execution completed successfully Mapred Local Task Succeeded . Convert the Join into MapJoin OK APP首頁 APP首頁_價格比京東低 APP首頁 APP首頁_價格比京東低 Time taken: 14.279 seconds hive>
hive -e "SQL"
方式執行利用hive -e "SQL"
的方式進入hive控制檯並直接執行sql腳本,例如:app
hive -e " set mapred.job.queue.name=pms; set mapred.job.name=[HQL]exps_prepro_query; select page_name, tpa_name from pms.pms_exps_prepro limit 2;"
hive -f SQL文件
方式執行執行sql文件中的sql腳本,例如:oop
pms_exps_prepro.sql文件內容以下:spa
set mapred.job.queue.name=pms; set hive.exec.reducers.max=48; set mapred.reduce.tasks=48; set mapred.job.name=[HQL]pms_exps_prepro; drop table if exists pms.pms_exps_prepro; create table pms.pms_exps_prepro as select a.provinceid, a.cityid, a.ieversion, a.platform, '${date}' as ds from track_exps a;
上述文件中的sql腳本接收一個日期,接收參數寫法相似${date}
,執行時以下執行:code
date=2015-10-22 hive -f pms_exps_prepro.sql --hivevar date=$date
下面以一個業務場景闡述關於hive轉義字符的問題orm
track_exps記錄曝光數據,如今小A但願獲取2015-10-20有效的曝光數據
其中有效的曝光記錄是指,
* relatedinfo
字段知足數字.數字.數字.數字.數字
的格式,
例如4.4.5.1080100.1
ci
extfield1
字段知足request-字符串,section-數字
的格式, request-b470805b620900ac492bb892ad7e955e,section-4
對於這個問題,小A寫出了以下sql腳本:rem
select * from track_exps where ds = '2015-10-20' and relatedinfo rlike '^4.\d+.\d+.\d+.\d+$' and extfield1 rlike '^request.+section-\d+$';
可是因爲正則表達式是被包含在sql裏面,因此裏面的特殊字符須要轉義
hive -e "SQL"
的方式執行改動以下:
hive -e " set mapred.job.queue.name=pms; explain select cityid from track_exps where ds = '2015-10-20' and relatedinfo rlike '\\^4\\.\\\d\\+\\.\\\d\\+\\.\\\d\\+\\.\\\d\\+\\$' and extfield1 rlike '\\^request\\.\\+section\\-\\\d\\+\\$';"
查看執行計劃,能夠肯定正則表達式解析正確了:
... predicate: expr: ((relatedinfo rlike '^4.\d+.\d+.\d+.\d+$') and (extfield1 rlike '^request.+section-\d+$')) type: boolean ...
分析以下:
在hive -e "SQL"
的執行方式中,"'正則表達式'"
,正則表達式先被一個單引號括起來,再被一個雙引號括起來的,因此正則表達式裏面,\\^
的第一個\
用來解析第二個\
,第二個\
才真正起到了轉義的做用
hive -f SQL文件
的方式執行改動以下:
pms_exps_prepro.sql文件內容以下:
select * from track_exps where ds = '2015-10-20' and relatedinfo rlike '\^4\.\\d\+\.\\d\+\.\\d\+\.\\d\+\$' and extfield1 rlike '\^request\.\+section\-\\d\+\$';
分析以下:
不一樣於hive -e "SQL"
的執行方式,由於是sql文件,因此正則表達式只被一個單引號括起來而已,一個\
就起到了轉義的做用了