一、配置java
1)解壓到/opt/moduelsnode
2)配置HIVE_HOMEpython
3)配置HADOOP_HOME和HIVE_CONF_DIR到hive-env.shmysql
4)在HDFS文件系統上建立HIVE元數據存儲目錄並賦予權限sql
5)bin/hive -->使用sql語句數據庫
二、安裝MySQL並配置apache
1)unzip mysql編程
2)rpm -e --nodeps mysql函數
3)rpm -ivh mysql-serveroop
4)cat /root/.mysql_secret
5)rpm -ivh mysql-client
6)mysql -uroot -p[password]
7)set password=password('123456');
8)update user set Host='%'-> where User='root' and Host = 'localhost';
9)flush privileges;
10)tar -zxvf mysql-connector
11) cp mysql-connector-java /opt/moduels/hive/lib
12)配置hive-site.xml中URL、DrvierName、UserName、Password -->端口號3306,DataBase=metastore
-->https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
三、hive基本操做
1)列分隔符
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
2)加載本地數據
load data local inpath '/opt/datas/student.txt' (overwrite) into table student;
3)desc formated(extended) student;
4)show functions; -->desc function(extended) substring;
5)數據的清除 truncate table table_name [partition parition_spec];
四、一些配置
1)配置client.header和client.currentdb來顯示當前數據庫
2)日誌文件配置
3)set; -->查看配置信息 -->set hive.root.logger=INFO,console;設置日誌信息打印在控制檯
4)經常使用交互式命令 bin/hive -help(-i,-f,-e)
五、建立表的三種方式
1)create table test01(ip string comment '...',user string)
comment 'access log'
row format delimited fields terminated by ' '
stored as textfile
location '/user/hive/warehouse/logs'
2)create table test02
as select ip,user from test01;
3)create table test03
like test01;
六、Hive的數據類型
1)table ,load E
2)select,python T
3)sub table L
七、Hive中表的類型
1)管理表 --> 刪除表時,會刪除表數據以及元數據
2)託管表(外部表,external) --> 刪除表時,只會刪除元數據而不會刪除表數據
3)分區表(partitioned tables) --> 查詢時能夠經過where子句來指定分區
create table dept_partition(deptno int,dname string,loc string)
partitioned by(event_month string[,event_day string]) -->二級分區
row format delimited fields terminated by '\t';
加載數據:
load data local inpath '/opt/datas/emp.txt' into table emp_partition partition (mouth='201509');
查詢:
where mouth = '201509';
注意事項:
a.本身手動建立分區表文件夾並put數據,並無將分區元數據寫入元數據庫,因此沒法讀取數據,能夠手動修復:
msck repair table dept_partition;
或者 alter table dept_part add partition(day='20150913');
b.查看錶的分區數:show partitions dept_partition;
八、導出表的方式
1)insert overwrite local directory '/opt/datas/hive_exp_emp' -->去掉local導出到Hdfs文件系統上
row format delimited fields terminated by '\t'
collection items terminated by '\n'
select * from db_hive.emp;
2)bin/hive -e "select * from db_hive.emp" > /opt/datas/hive_exp_exp.txt -->沒有跑MapReduce任務
3)scoop hdfs/hive->rdbms or rdbms->hdfs/hive/hbase
九、Hive中常見的查詢
[WITH CommonTableExpression (, CommonTableExpression)*] (Note: Only available starting with Hive
0.13
.
0
)
SELECT [ALL | DISTINCT] select_expr, select_expr, ... -->所有和查重
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]
1)select t.empno,t.ename,t.deptno from emp t; -->t
2)between -->
select t.empno,t.ename,t.deptno from emp t where t.sal between 800 and 1500;
3)is null/is not null
select t.empno,t.ename,t.deptno from emp t where comm is null;
4)group by/having
查詢每一個部門的平均工資
select avg(sal) avg_sal from emp
group by deptno;
查詢每一個部門中每一個崗位的最高薪水
select t.deptno,t.job,max(t.sal) max_sal from emp t group by t.deptno,t.job; -->雙重分組
having與where區別
where 針對單條記錄進行篩選
having針對分組結果進行篩選
十、Export/Import
1)Export -->將Hive表中的數據導出
EXPORT TABLE tablename [PARTITION (part_column=
"value"
[, ...])]
TO
'export_target_path'
[ FOR replication(
'eventid'
) ] -->path指的爲HDFS上的路徑
IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column=
"value"
[, ...])]]
FROM
'source_path'
[LOCATION
'import_target_path'
]