hive採坑

時間 2019-11-12

標籤 hive 欄目 Hadoop 简体版

原文原文鏈接

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : Access denied for user 'root'@'centos35' (using password: YES)

上面是問題，下面是相應的解決方法，很明顯是mysql訪問出現問題：java

grant all privileges on *.* to 'root' @'centos35' identified by '123456'node

hive分佈式啓動進程運行時間過長後，發現啓動jps服務出現如下問題：python

Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread

內存不夠了。mysql

echo 1 > /proc/sys/vm/drop_caches

hive操做：sql

主要從csv中導入數據庫，代碼以下數據庫

建立表：apache

create external table if not exists batch_task(task_name STRING,inst_num INT,job_name STRING,task_type STRING,status STRING,start_time INT,end_time INT,plan_cpu INT,plan_mem INT) ROW format delimited fields terminated by ',' STORED AS TEXTFILE;

導入數據：這裏不加local說明是在hdfs上存儲的文件centos

load data inpath '/clusterdata/batch_task.csv' into table batch_task

查詢數據：分佈式

select * from batch_task limit 10;

發現某個進程仍是啥一直在佔用，致使連重啓都不行了：ide

exec sudo reboot

文件清理：

du -h --max-depth=1 /home/zc/
看路徑下文件的磁盤佔用

df -hl
整個文件系統的磁盤佔用

hive查詢變大的時候，會提交job，可是會發現job運行老是失敗，

最終看網上的解決方法也沒能解決，但知道我看到hadoop集羣中存在大量的unhealthy node....

解決方法參考：

數據上傳到hdfs

hdfs dfs -put ak.csv /data

hive分佈式啓動：

主節點

hive --service metastore &

從節點

hive

hive groupby

解釋參照這兩篇文章，很生動，很詳細

出現問題

Expression Not In Group By Key這篇解釋的很清楚

因爲每一組的group by出來的值都是一個集合，所以在hive中使用collect_set()，

select task_name,job_name,collect_set(machine_id)[0],collect_set(cpu_avg)[0],collect_set(cpu_max)[0],collect_set(mem_avg)[0],collect_set(mem_max)[0] from batch_instance group by task_name,job_name

hive查詢結果的保存方式，這篇講的很好　

注意：collect_set()顯示的是去重後的集合，collect_list()顯示的是單純知足需求的列轉行！！

存儲到文件中

hive> insert overwrite directory "/home/zc/dzx"  
      > row format delimited fields terminated by ","   
      > select user, login_time from user_login;

存儲在表中

hive> create table query_result   
      > as  
      > select user, login_time from user_login;

查詢hive表中某一列不包含某個字符

select * from task where task_name not like 'task'

這裏須要注意：group by後會出現自動命名列的狀況，好比出現

這時候須要從新命名這些列名

alter table batch_ins change `_c2` machine_id  String;

導出hive中數據到csv

hive -e "set hive.cli.print.header=true; select * from data_table" | sed 's/[\t]/,/g'  > a.csv

hive處理空值：

select * from dag_small where cpu_avg is not null and cpu_max is not null and mem_avg is not null and mem_max is not null;

hive雙表聯合

create table user
as 
select pv.pageid, u.age 
FROM page_view pv 
JOIN user u 
ON (pv.userid = u.userid);

左鏈接

select * from A left outer join B on xxx;

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。