第1節 HUE：1四、1五、1六、hue與hdfs、yarn集羣、hive、impala、mysql的整合

時間 2020-06-04

標籤 hue hdfs yarn 集羣 hive impala mysql 整合欄目 Hadoop 简体版

原文原文鏈接

三、hue與其餘框架的集成

3.一、hue與hadoop的HDFS以及yarn集成

第一步：更改全部hadoop節點的core-site.xml配置

記得更改完core-site.xml以後必定要重啓hdfs與yarn集羣java

<property>node

<name>hadoop.proxyuser.root.hosts</name>mysql

<value>*</value>web

</property>sql

<property>shell

<name>hadoop.proxyuser.root.groups</name>數據庫

<value>*</value>apache

</property>vim

第二步：更改全部hadoop節點的hdfs-site.xml

<property>api

<name>dfs.webhdfs.enabled</name>

</property>

第三步：重啓hadoop集羣

在node01機器上面執行如下命令

cd /export/servers/hadoop-2.6.0-cdh5.14.0

sbin/stop-dfs.sh

sbin/start-dfs.sh

sbin/stop-yarn.sh

sbin/start-yarn.sh

第四步：中止hue的服務，並繼續配置hue.ini

cd /export/servers/hue-3.9.0-cdh5.14.0/desktop/conf

vim hue.ini

配置咱們的hue與hdfs集成

[[hdfs_clusters]]

[[[default]]]

fs_defaultfs=hdfs://node01.hadoop.com:8020

webhdfs_url=http://node01.hadoop.com:50070/webhdfs/v1

hadoop_hdfs_home=/export/servers/hadoop-2.6.0-cdh5.14.0

hadoop_bin=/export/servers/hadoop-2.6.0-cdh5.14.0/bin

hadoop_conf_dir=/export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop

配置咱們的hue與yarn集成

[[yarn_clusters]]

[[[default]]]

resourcemanager_host=node01

resourcemanager_port=8032

submit_to=True

resourcemanager_api_url=http://node01:8088

history_server_api_url=http://node01:19888

3.二、配置hue與hive集成

若是須要配置hue與hive的集成，咱們須要啓動hive的metastore服務以及hiveserver2服務（impala須要hive的metastore服務，hue須要hive的hiveserver2服務）

更改hue的配置hue.ini

修改hue.ini

[beeswax]

hive_server_host=node03.hadoop.com

hive_server_port=10000

hive_conf_dir=/export/servers/hive-1.1.0-cdh5.14.0/conf

server_conn_timeout=120

auth_username=root

auth_password=123456

[metastore]

#容許使用hive建立數據庫表等操做

enable_new_create_table=true

啓動hive的metastore服務

去node03機器上啓動hive的metastore以及hiveserver2服務

cd /export/servers/hive-1.1.0-cdh5.14.0

nohup bin/hive --service metastore & （若是配置了hive的環境變量，則能夠省略bin/的路徑）

nohup bin/hive --service hiveserver2 &（若是配置了hive的環境變量，則能夠省略bin/的路徑）

從新啓動hue，而後就能夠經過瀏覽器頁面操做hive了

3.三、配置hue與impala的集成

中止hue的服務進程

修改hue.ini配置文件

[impala]

server_host=node03

server_port=21050

impala_conf_dir=/etc/impala/conf

3.四、配置hue與mysql的集成

找到databases 這個選項，將這個選項下面的mysql註釋給打開，而後配置mysql便可,大概在1547行

[[[mysql]]]

nice_name="My SQL DB"

engine=mysql

host=node03.hadoop.com

port=3306

user=root

password=123456

3.五、從新啓動hue的服務

cd /export/servers/hue-3.9.0-cdh5.14.0/

build/env/bin/supervisor

3.六、解決hive以及impala執行權限不足的問題

在咱們hive當中執行任意的查詢，只要是須要跑MR的程序，就會報錯，發現權限不夠的異常，具體詳細信息以下：

INFO  : Compiling command(queryId=root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0): select count(1) from mystu

INFO  : Semantic Analysis Completed

INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)

INFO  : Completed compiling command(queryId=root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0); Time taken: 0.065 seconds

INFO  : Concurrency mode is disabled, not creating a lock manager

INFO  : Executing command(queryId=root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0): select count(1) from mystu

INFO  : Query ID = root_20180625191616_d02efd23-2322-4f3d-9cb3-fc3a06ff4ce0

INFO  : Total jobs = 1

INFO  : Launching Job 1 out of 1

INFO  : Starting task [Stage-1:MAPRED] in serial mode

INFO  : Number of reduce tasks determined at compile time: 1

INFO  : In order to change the average load for a reducer (in bytes):

INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>

INFO  : In order to limit the maximum number of reducers:

INFO  :   set hive.exec.reducers.max=<number>

INFO  : In order to set a constant number of reducers:

INFO  :   set mapreduce.job.reduces=<number>

ERROR : Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=admin, access=EXECUTE, inode="/tmp":root:supergroup:drwxrwx---

咱們須要給hdfs上面的幾個目錄執行權限便可

hdfs dfs -chmod o+x /tmp

hdfs dfs -chmod o+x /tmp/hadoop-yarn

hdfs dfs -chmod o+x /tmp/hadoop-yarn/staging

或者咱們能夠這樣執行

hdfs dfs -chmod -R o+x /tmp

能夠將/tmp目錄下全部的文件及文件夾都賦予權限

繼續執行hive的任務就不會報錯了

=========================================================

課程總結：

impala：sql語句的一個查詢工具，
特色：比較快
缺點：佔用內存大

impala架構：
impala-server：從節點主要用於執行sql語句的查詢
impala-catalog：主節點主要用於存儲元數據信息
impala-state-store: 主節點主要用於保存一些sql語句的執行狀態

impala的安裝：沒有提供tar.gz的安裝包，使用rpm的安裝包來進行安裝
下載一個rpm的倉庫，5個G，全部的大數據軟件均可以經過rpm包方式來進行安裝

製做本地的yum源搞定
第一個：配置文件
第二個：httpd服務
第三個：rpm的倉庫

進行安裝搞定

impala的配置：搞定
impala須要三個核心配置文件 hdfs-site.xml core-site.xml hive-site.xml
impala的配置文件也得須要更改

impala的語法的使用：

進入impala-shell以前的一些參數
impala-shell -q 與hive -e 相似，不進入impala的shell交互窗口直接執行sql語句
impala-shell -f 與hive -f 相似，直接執行sql腳本
impala-shell -r 刷新元數據信息，進入impala-shell以前刷新元數據信息，全量的刷新，若是數據量比較大，性能消耗比較大

進入impala-shell以後的一些參數
refresh dbName.tabName 局部的刷新，只刷新某張已經存在的表的元數據信息
invalidate metadata 全量的刷新，，適用於hive當中新建的數據庫或者數據庫表的狀況

impala的建庫建表語法：與hive建庫建表如出一轍，參見hive的表建立

impala的sql語法：與hive的sql語法相似，參見hive的sql語法

impala的數據導入的幾種方式：搞定
load datal 這種加載數據的方式，只能從hdfs上面加載數據
insert into table select

impala java 開發瞭解