第二章 impala基礎使用

時間 2019-11-11

標籤第二 impala 基礎使用欄目 Hadoop 简体版

原文原文鏈接

第二章 impala基本使用

一、impala的使用

1.一、impala-shell語法

1.1.一、impala-shell的外部命令參數語法

不須要進入到impala-shell交互命令行當中便可執行的命令參數node

impala-shell後面執行的時候能夠帶不少參數：mysql

-h 查看幫助文檔web

impala-shell -h

[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -h
Usage: impala_shell.py [options]

Options:
  -h, --help            show this help message and exit
  -i IMPALAD, --impalad=IMPALAD
                        <host:port> of impalad to connect to
                        [default: node03.hadoop.com:21000]
  -q QUERY, --query=QUERY
                        Execute a query without the shell [default: none]
  -f QUERY_FILE, --query_file=QUERY_FILE
                        Execute the queries in the query file, delimited by ;.
                        If the argument to -f is "-", then queries are read
                        from stdin and terminated with ctrl-d. [default: none]
  -k, --kerberos        Connect to a kerberized impalad [default: False]
  -o OUTPUT_FILE, --output_file=OUTPUT_FILE
                        If set, query results are written to the g

-r 刷新整個元數據，數據量大的時候，比較消耗服務器性能sql

impala-shell -r

#結果
[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -r
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Invalidating Metadata
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

The HISTORY command lists all shell commands in chronological order.
***********************************************************************************
+==========================================================================+
| DEPRECATION WARNING:                                                     |
| -r/--refresh_after_connect is deprecated and will be removed in a future |
| version of Impala shell.                                                 |
+==========================================================================+
Query: invalidate metadata
Query submitted at: 2019-08-22 14:45:28 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=ce4db858e1dfd774:814fabac00000000
Fetched 0 row(s) in 5.04s

-B 去格式化，查詢大量數據能夠提升性能
--print_header 去格式化顯示列名
--output_delimiter 指定分隔符
-v 查看對應版本shell

impala-shell -v -V

#結果
[root@node03 hive-1.1.0-cdh5.14.0]# impala-shell -v -V
Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018

-f 執行查詢文件
--query_file 指定查詢文件數據庫

cd /export/servers
vim impala-shell.sql

#寫入下面兩段話
use weblog;
select * from ods_click_pageviews limit 10;

#賦予可執行權限
chmod 755 imapala-shell.sql 

#經過-f 參數來執行執行的查詢文件
impala-shell -f impala-shell.sql

#結果
[root@node03 hivedatas]# impala-shell -f imapala-shell.sql 
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Query: use hivesql
Query: select * from ods_click_pageviews limit 10
Query submitted at: 2019-08-22 15:29:54 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a4d51930cf99b9d:21f02c4e00000000
+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
| session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |
+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
| d1328698-d475-4973-86ee-15ad9da8c860 | 1.80.249.223    | -           | 2013-09-18 07:57:33 | /hadoop-hive-intro/        | 1          | 60            | "http://www.google.com.hk/url?sa=t&rct=j&q=hive%E7%9A%84%E5%AE%89%E8%A3%85&source=web&cd=2&ved=0CC4QFjAB&url=%68%74%74%70%3a%2f%2f%62%6c%6f%67%2e%66%65%6e%73%2e%6d%65%2f%68%61%64%6f%6f%70%2d%68%69%76%65%2d%69%6e%74%72%6f%2f&ei=5lw5Uo-2NpGZiQfCwoG4BA&usg=AFQjCNF8EFxPuCMrm7CvqVgzcBUzrJZStQ&bvm=bv.52164340,d.aGc&cad=rjt" | "Mozilla/5.0(WindowsNT5.2;rv:23.0)Gecko/20100101Firefox/23.0"                                                                                                                                     | 14764           | 200    | 20130918 |
| 0370aa09-ebd6-4d31-b6a5-469050a7fe61 | 101.226.167.201 | -           | 2013-09-18 09:30:36 | /hadoop-mahout-roadmap/    | 1          | 60            | "http://blog.fens.me/hadoop-mahout-roadmap/"

-i 鏈接到impaladvim

--impalad 指定impalad去執行任務bash

-o 保存執行結果到文件當中去服務器

--output_file 指定輸出文件名session

impala-shell -f impala-shell.sql -o fizz.txt

#結果
[root@node03 hivedatas]# impala-shell -f imapala-shell.sql -o fizz.txt
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Query: use hivesql
Query: select * from ods_click_pageviews limit 10
Query submitted at: 2019-08-22 15:31:45 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=7c421ab5d208f3b1:dec5a09300000000
Fetched 10 row(s) in 0.13s

#當前文件夾多了一個 fizz.txt 文件
[root@node03 hivedatas]# ll
total 2592
-rw-r--r-- 1 root root     511 Aug 21  2017 dim_time_dat.txt
-rw-r--r-- 1 root root    9926 Aug 22 15:31 fizz.txt
-rwxr-xr-x 1 root root      57 Aug 22 15:29 imapala-shell.sql
-rwxrwxrwx 1 root root     133 Aug 20 00:36 movie.txt
-rw-r--r-- 1 root root   18372 Jun 17 18:33 pageview2
-rwxr-xr-x 1 root root     154 Aug 20 00:32 test.txt
-rw-r--r-- 1 root root     327 Aug 20 02:37 user_table
-rw-r--r-- 1 root root   10361 Jun 18 09:00 visit2
-rw-r--r-- 1 root root 2587511 Jun 17 18:05 weblog2

-p 顯示查詢計劃

impala-shell -f impala-shell.sql -p

-q 執行片斷sql語句

impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"

[root@node03 hivedatas]# impala-shell -q "use hivesql;select * from ods_click_pageviews limit 10;"
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
Query: use hivesql
Query: select * from ods_click_pageviews limit 10
Query submitted at: 2019-08-22 15:36:58 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=b443d56565419f60:a149235700000000
+--------------------------------------+-----------------+-------------+---------------------+----------------------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+--------+----------+
| session                              | remote_addr     | remote_user | time_local          | request                    | visit_step | page_staylong | http_referer                                                                                                                                                                                                                                                                                                                    | http_user_agent                                                                                                                                                                                   | body_bytes_sent | status | datestr  |

1.1.二、impala-shell的內部命令行參數語法

進入impala-shell命令行以後能夠執行的語法

進入impala-shell：

impala-shell  #任意目錄

#結果
[root@node03 hivedatas]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to node03.hadoop.com:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)

To see more tips, run the TIP command.
***********************************************************************************
[node03.hadoop.com:21000] >

help命令

幫助文檔

[node03.hadoop.com:21000] > help;

Documented commands (type help <topic>):
========================================
compute  describe  explain  profile  rerun   set    show  unset  values   with
connect  exit      history  quit     select  shell  tip   use    version

Undocumented commands:
======================
alter   delete  drop  insert  source  summary  upsert
create  desc    help  load    src     update

connect命令

connect hostname 鏈接到某一臺機器上面去執行

connect node02;

#結果
[node03.hadoop.com:21000] > connect node02;
Connected to node02:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
[node02:21000] >

refresh命令

refresh dbname.tablename 增量刷新，刷新某一張表的元數據，主要用於刷新hive當中數據表裏面的數據改變的狀況

用於刷新hive當中數據表裏面的數據改變的狀況

refresh movie_info;

#結果
[node03:21000] > refresh movie_info;
Query: refresh movie_info
Query submitted at: 2019-08-22 15:49:24 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=f74330d533ff2402:27364f7600000000
Fetched 0 row(s) in 0.27s

invalidate metadata 命令：

invalidate metadata全量刷新，性能消耗較大，主要用於hive當中新建數據庫或者數據庫表的時候來進行刷新

invalidate metadata;

#結果
[node03:21000] > invalidate metadata;
Query: invalidate metadata
Query submitted at: 2019-08-22 15:48:04 (Coordinator: http://node03.hadoop.com:25000)
Query progress can be monitored at: http://node03.hadoop.com:25000/query_plan?query_id=6a431748d41bc369:7eeb053400000000
Fetched 0 row(s) in 2.87s

explain 命令：

用於查看sql語句的執行計劃

explain select * from stu;

#結果
[node03:21000] > explain select * from user_table;
Query: explain select * from user_table
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B                                       |
| Per-Host Resource Estimates: Memory=32.00MB                                        |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| hivesql.user_table                                                                 |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [hivesql.user_table]                                                  |
|    partitions=1/1 files=1 size=327B                                                |
+------------------------------------------------------------------------------------+
Fetched 11 row(s) in 3.99s

explain的值能夠設置成0,1,2,3等幾個值，其中3級別是最高的，能夠打印出最全的信息

set explain_level=3;

#結果
[node03:21000] > set explain_level=3;
EXPLAIN_LEVEL set to 3
[node03:21000] >

profile命令：

執行sql語句以後執行，能夠打印出更加詳細的執行步驟，

主要用於查詢結果的查看，集羣的調優等

select * from user_table;
profile;

#部分結果截取
[node03:21000] > profile;
Query Runtime Profile:
Query (id=ff4799938b710fbb:7997836800000000):
  Summary:
    Session ID: a14d3b3894050309:7f300ddf8dcd8584
    Session Type: BEESWAX
    Start Time: 2019-08-22 15:58:22.786612000
    End Time: 2019-08-22 15:58:24.558806000
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
    User: root
    Connected User: root
    Delegated User: 
    Network Address: ::ffff:192.168.52.120:48318
    Default Db: hivesql
    Sql Statement: select * from user_table
    Coordinator: node03.hadoop.com:22000
    Query Options (set by configuration): EXPLAIN_LEVEL=3
    Query Options (set by configuration and planner): EXPLAIN_LEVEL=3,MT_DOP=0
    Plan:

注意:在hive窗口當中插入的數據或者新建的數據庫或者數據庫表，在impala當中是不可直接查詢到的，須要刷新數據庫，在impala-shell當中插入的數據，在impala當中是能夠直接查詢到的，不須要刷新數據庫，其中使用的就是catalog這個服務的功能實現的，catalog是impala1.2版本以後增長的模塊功能，主要做用就是同步impala之間的元數據

1.二、建立數據庫

1.1.1進入impala交互窗口

impala-shell #進入到impala的交互窗口

1.1.2查看全部數據庫

show databases;

1.1.3建立與刪除數據庫

建立數據庫

CREATE DATABASE IF NOT EXISTS mydb1;
drop database  if exists  mydb;

1.三、建立數據庫表

建立student表

CREATE TABLE IF NOT EXISTS mydb1.student (name STRING, age INT, contact INT );

建立employ表

create table employee (Id INT, name STRING, age INT,address STRING, salary BIGINT);

1.3.一、數據庫表中插入數據

insert into employee (ID,NAME,AGE,ADDRESS,SALARY)VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000 );
insert into employee values (2, 'Khilan', 25, 'Delhi', 15000 );
Insert into employee values (3, 'kaushik', 23, 'Kota', 30000 );
Insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000 );
Insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000 );
Insert into employee values (6, 'Komal', 22, 'MP', 32000 );

數據的覆蓋

Insert overwrite employee values (1, 'Ram', 26, 'Vishakhapatnam', 37000 );

執行覆蓋以後，表中只剩下了這一條數據了

另一種建表語句

create table customer as select * from employee;

1.3.二、數據的查詢

select * from employee;
select name,age from employee;

1.3.三、刪除表

DROP table  mydb1.employee;

1.3.四、清空表數據

truncate  employee;

1.3.五、建立視圖

CREATE VIEW IF NOT EXISTS employee_view AS select name, age from employee;

1.3.六、查看視圖數據

select * from employee_view;

1.四、order by語句

基礎語法

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]
Select * from employee ORDER BY id asc;

1.五、group by 語句

Select name, sum(salary) from employee Group BY name;

1.六、 having 語句

基礎語法

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST]

按年齡對錶進行分組，並選擇每一個組的最大工資，並顯示大於20000的工資

select max(salary) from employee group by age having max(salary) > 20000

1.七、 limit語句

select * from employee order by id limit 4;

二、impala當中的數據表導入幾種方式

第一種方式，經過load hdfs的數據到impala當中去

create table user(id int ,name string,age int ) row format delimited fields terminated by "\t";

準備數據user.txt並上傳到hdfs的 /user/impala路徑下去

上傳user.txt到hadoop上去：

hdfs dfs -put user.txt /user/impala/

查看是否上傳成功：

hdfs dfs -ls /user/impala

1       kasha   15
2       fizz        20
3       pheonux    30
4       manzi  50

加載數據

load data inpath '/user/impala/' into table user;

查詢加載的數據

select  *  from  user;

若是查詢不不到數據，那麼須要刷新一遍數據表

refresh  user;

第二種方式：

create  table  user2   as   select * from  user;

第三種方式：

insert  into  #不推薦使用 由於會產生大量的小文件

千萬不要把impala當作一個數據庫來使用

第四種方式：

insert  into  select  #用的比較多

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。