Hadoop-Hive

時間 2019-11-12

標籤 hadoop hive 欄目 Hadoop 简体版

原文原文鏈接

一、配置java

　　1）解壓到/opt/moduelsnode

　　2）配置HIVE_HOMEpython

　　3)配置HADOOP_HOME和HIVE_CONF_DIR到hive-env.shmysql

　　4)在HDFS文件系統上建立HIVE元數據存儲目錄並賦予權限sql

　　5）bin/hive　　-->使用sql語句數據庫

二、安裝MySQL並配置apache

　　1）unzip mysql編程

　　2)rpm -e --nodeps mysql函數

　　3)rpm -ivh mysql-serveroop

　　4)cat /root/.mysql_secret

　　5)rpm -ivh mysql-client

　　6)mysql -uroot -p[password]

　　7)set password=password('123456');

　　8)update user set Host='%'-> where User='root' and Host = 'localhost';

　　9）flush privileges;

　　10）tar -zxvf mysql-connector

　　11) cp mysql-connector-java /opt/moduels/hive/lib

　　12)配置hive-site.xml中URL、DrvierName、UserName、Password　　　　-->端口號3306，DataBase=metastore

　　-->https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin

三、hive基本操做

　　1)列分隔符

　　　　ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

　　2)加載本地數據

　　　　load data local inpath '/opt/datas/student.txt' (overwrite) into table student;

　　3）desc formated(extended) student;

　　4)show functions;　　-->desc function(extended) substring;

　　5)數據的清除　　truncate table table_name [partition parition_spec];

四、一些配置

　　1）配置client.header和client.currentdb來顯示當前數據庫

　　2）日誌文件配置

　　3）set;　　-->查看配置信息　　-->set hive.root.logger=INFO,console;設置日誌信息打印在控制檯

　　4）經常使用交互式命令 bin/hive -help(-i,-f,-e)

五、建立表的三種方式

　　1）create table test01(ip string comment '...',user string)

　　　　comment 'access log'

　　　　row format delimited fields terminated by ' '

　　　　stored as textfile

　　　　location '/user/hive/warehouse/logs'

　　2)create table test02

　　　　as select ip,user from test01;

　　3)create table test03

　　　　like test01;

六、Hive的數據類型

　　1）table ,load　　　E

　　2)select,python　　T

　　3)sub table　　　　L

七、Hive中表的類型

　　1）管理表　　　　-->　　刪除表時，會刪除表數據以及元數據

　　2）託管表（外部表，external）　　-->　　刪除表時，只會刪除元數據而不會刪除表數據

　　3）分區表（partitioned tables）　-->　　查詢時能夠經過where子句來指定分區　

　　　　　　create table dept_partition(deptno int,dname string,loc string)

　　　　　　　　partitioned by(event_month string[,event_day string])　　-->二級分區

　　　　　　　　row format delimited fields terminated by '\t';

　　　　加載數據：

　　　　　　load data local inpath '/opt/datas/emp.txt' into table emp_partition partition (mouth='201509');

　　　　查詢：

　　　　　　where mouth = '201509';

　　　　注意事項：

　　　　　　a.本身手動建立分區表文件夾並put數據，並無將分區元數據寫入元數據庫，因此沒法讀取數據，能夠手動修復：

　　　　　　　　msck repair table dept_partition;

　　　　　　　　或者　　alter table dept_part add partition(day='20150913');

　　　　　　b.查看錶的分區數：show partitions dept_partition;

八、導出表的方式

　　1）insert overwrite local directory '/opt/datas/hive_exp_emp'　　-->去掉local導出到Hdfs文件系統上

　　　　　　row format delimited fields terminated by '\t'

　　　　　　collection items terminated by '\n'

　　　　　　　select * from db_hive.emp;

　　2)bin/hive -e "select * from db_hive.emp" > /opt/datas/hive_exp_exp.txt　　-->沒有跑MapReduce任務

　　3）scoop　　hdfs/hive->rdbms　　or　　rdbms->hdfs/hive/hbase

九、Hive中常見的查詢　

[WITH CommonTableExpression (, CommonTableExpression)*] (Note: Only available starting with Hive 0.13 . 0 )

SELECT [ALL | DISTINCT] select_expr, select_expr, ...　　-->所有和查重

FROM table_reference

[WHERE where_condition]

[GROUP BY col_list]

[ORDER BY col_list]

[CLUSTER BY col_list

| [DISTRIBUTE BY col_list] [SORT BY col_list]

]

[LIMIT number]

　　1）select t.empno,t.ename,t.deptno from emp t;　　-->t

　　2)between　　-->

　　　　select t.empno,t.ename,t.deptno from emp t where t.sal between 800 and 1500;

　　3)is null/is not null

　　　　select t.empno,t.ename,t.deptno from emp t where comm is null;

　　4)group by/having　

　　　　查詢每一個部門的平均工資

　　　　select avg(sal) avg_sal from emp

　　　　group by deptno;

　　　　查詢每一個部門中每一個崗位的最高薪水

　　　　select t.deptno,t.job,max(t.sal) max_sal from emp t group by t.deptno,t.job;　　-->雙重分組

　　　　having與where區別

　　　　　　where 針對單條記錄進行篩選

　　　　　　having針對分組結果進行篩選　

十、Export/Import

　　1)Export　　-->將Hive表中的數據導出

 
  　　　　EXPORT TABLE tablename [PARTITION (part_column= 
  "value" 
  [, ...])] 
 
  TO  
  'export_target_path'  
  [ FOR replication( 
  'eventid' 
  ) ]　　　　-->path指的爲HDFS上的路徑 
 
   　　2）Import 
  　　　 
   　　　　IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column= 
   "value" 
   [, ...])]] 
  
   FROM  
   'source_path' 
  
   [LOCATION  
   'import_target_path' 
   ] 
  
   十一、sort 
  
   　　1）order by　　-->全局排序，一個Reduce 
  
   　　2)sort by　　-->每一個reduce內部進行排序，全局不是排序 
  
   　　3)distribute by　　-->相似MR中的partition，進行分區，結合sort by使用 
  
   　　　　insert overwrite local directory '/opt/datas/dist_emp'  
  
   　　　　select * from emp  
  
   　　　　distribute by deptno
 
   　　　　sort by empno asc; 
  
   　　4)cluster by　　-->當distribute 和sort字段相同時，使用此方式 
  
   十二、Hive自帶function及udf編程　　-->user defination function 
  
   　　1）https://cwiki.apache.org/confluence/display/Hive/HivePlugins 
  
   　　2)編程步驟： 
  
   　　　　a.繼承org.apache.hadoop.hive.ql.UDF 
  
   　　　　b.須要實現evaluate函數 
  
   　　　　ps.必須有返回類型，經常使用Text/LongWritable等類型，不推薦使用java類型 
  
   　　3）配置pom.xml文件中的hive-jdbc和hive-exec 
  
   　　4)使用：　　add jar /opt/datas/udf-tolower.jar; 
  
   　　　　　　　　create temporary function my_lower as "com.cnblog.hive.udf.LowerUDF";　　-->類名 
  
   　　　　　　或者 create function myfunc as 'myclass' using jar 'hdfs://hostname/path/to/jar';　　-->文件必須在hdfs文件系統上

相關標籤/搜索

Hadoop

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。