org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version. Underlying cause: java.sql.SQLException : Access denied for user 'root'@'centos35' (using password: YES)
上面是問題,下面是相應的解決方法,很明顯是mysql訪問出現問題:java
grant all privileges on *.* to 'root' @'centos35' identified by '123456'node
hive分佈式啓動進程運行時間過長後,發現啓動jps服務出現如下問題:python
Error occurred during initialization of VM java.lang.OutOfMemoryError: unable to create new native thread
內存不夠了。mysql
echo 1 > /proc/sys/vm/drop_caches
hive操做:sql
主要從csv中導入數據庫,代碼以下數據庫
建立表:apache
create external table if not exists batch_task(task_name STRING,inst_num INT,job_name STRING,task_type STRING,status STRING,start_time INT,end_time INT,plan_cpu INT,plan_mem INT) ROW format delimited fields terminated by ',' STORED AS TEXTFILE;
導入數據:這裏不加local說明是在hdfs上存儲的文件centos
load data inpath '/clusterdata/batch_task.csv' into table batch_task
查詢數據:分佈式
select * from batch_task limit 10;
發現某個進程仍是啥一直在佔用,致使連重啓都不行了:ide
exec sudo reboot
文件清理:
du -h --max-depth=1 /home/zc/ 看路徑下文件的磁盤佔用
df -hl
整個文件系統的磁盤佔用
hive查詢變大的時候,會提交job,可是會發現job運行老是失敗,
最終看網上的解決方法也沒能解決,但知道我看到hadoop集羣中存在大量的unhealthy node....
數據上傳到hdfs
hdfs dfs -put ak.csv /data
hive分佈式啓動:
主節點
hive --service metastore &
從節點
hive
hive groupby
出現問題
因爲每一組的group by出來的值都是一個集合,所以在hive中使用collect_set(),
select task_name,job_name,collect_set(machine_id)[0],collect_set(cpu_avg)[0],collect_set(cpu_max)[0],collect_set(mem_avg)[0],collect_set(mem_max)[0] from batch_instance group by task_name,job_name
hive查詢結果的保存方式,這篇講的很好
注意:collect_set()顯示的是去重後的集合,collect_list()顯示的是單純知足需求的列轉行!!
存儲到文件中
hive> insert overwrite directory "/home/zc/dzx" > row format delimited fields terminated by "," > select user, login_time from user_login;
存儲在表中
hive> create table query_result > as > select user, login_time from user_login;
查詢hive表中某一列不包含某個字符
select * from task where task_name not like 'task'
這裏須要注意:group by後會出現自動命名列的狀況,好比出現
這時候須要從新命名這些列名
alter table batch_ins change `_c2` machine_id String;
導出hive中數據到csv
hive -e "set hive.cli.print.header=true; select * from data_table" | sed 's/[\t]/,/g' > a.csv
hive處理空值:
select * from dag_small where cpu_avg is not null and cpu_max is not null and mem_avg is not null and mem_max is not null;
hive雙表聯合
create table user as select pv.pageid, u.age FROM page_view pv JOIN user u ON (pv.userid = u.userid);
左鏈接
select * from A left outer join B on xxx;