impala介紹

 pandas和SQL數據分析實戰node

https://study.163.com/course/courseMain.htm?courseId=1006383008&share=2&shareId=400000000398149python

http://impala.apache.org/apache

Apache Impala is the open source, native analytic database
for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.安全

 

Do BI-style Queries on Hadoop

Impala provides low latency低延遲 and high concurrency 高併發for BI/analytic queries on Hadoop (not delivered by batch frameworks such as Apache Hive). Impala also scales linearly, even in multitenant environments.網絡

在Hadoop上執行BI風格的查詢
Impala爲Hadoop上的BI /分析查詢提供了低延遲和高併發性(不是由Apache Hive等批處理框架提供的)。
即便在多租戶環境中,Impala也能線性擴展。


Unify Your Infrastructure
Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment—no redundant infrastructure or data conversion/duplication.架構

統一基礎設施
與Hadoop部署同樣,利用相同的文件和數據格式以及元數據,安全性和資源管理框架 - 無需冗餘基礎架構或數據轉換/複製。


Implement Quickly

For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about re-inventing the implementation wheel.併發

快速實施
對於Apache Hive用戶,Impala使用相同的元數據和ODBC驅動程序。與Hive同樣,Impala支持SQL,所以您沒必要擔憂從新實現輪子。


Count on Enterprise-class Security

Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Sentry module, you can ensure that the right users and applications are authorized for the right data.app

依靠企業級安全
Impala與本地Hadoop安全和Kerberos進行身份驗證集成,而且經過Sentry模塊,您能夠確保正確的用戶和應用程序得到正確數據的受權。



Retain Freedom from Lock-in

Impala is open source (Apache License).框架

保持自由鎖定
Impala是開源的(Apache許可證)。

Expand the Hadoop User-verse
With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata store from source through analysis.dom

展開Hadoop User-verse
經過Impala,不管使用SQL查詢仍是BI應用程序,更多用戶均可以經過單個存儲庫和元數據存儲從源代碼經過分析與更多數據進行交互。
 

 

Overview

Impala raises the bar for SQL query performance on Apache Hadoop while retaining a familiar user experience. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time. Furthermore, Impala uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive, providing a familiar and unified platform for batch-oriented or real-time queries. (For that reason, Hive users can utilize Impala with little setup overhead.)

概觀
Impala提升了Apache Hadoop上SQL查詢性能的標準,同時保留了熟悉的用戶體驗。使用Impala,您能夠實時查詢數據,
不管是存儲在HDFS仍是Apache HBase中 - 包括SELECT,JOIN和聚合函數。
此外,Impala與Apache Hive同樣使用相同的元數據,SQL語法(Hive SQL),
ODBC驅動程序和用戶界面(Hue Beeswax),
爲面向批處理或實時查詢提供熟悉且統一的平臺。
(出於這個緣由,Hive用戶能夠在安裝開銷很小的狀況下使用Impala。)

 

 

Architecture

To avoid latency, Impala circumvents MapReduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs. The result is order-of-magnitude faster performance than Hive, depending on the type of query and configuration.

設計
爲了不延遲,Impala規避MapReduce經過專用分佈式查詢引擎直接訪問數據,該引擎很是相似於商業並行RDBMS中的數據。
結果是性能比Hive更高,這取決於查詢和配置的類型。

 

 

There are many advantages to this approach over alternative approaches for querying Hadoop data, including::

  • Thanks to local processing on data nodes, network bottlenecks are avoided.
  • A single, open, and unified metadata store can be utilized.
  • Costly data format conversion is unnecessary and thus no overhead is incurred.
  • All data is immediately query-able, with no delays for ETL.
  • All hardware is utilized for Impala queries as well as for MapReduce.
  • Only a single machine pool is needed to scale.
這種方法與查詢Hadoop數據的其餘方法相比有許多優勢,包括::

因爲數據節點上的本地處理,避免了網絡瓶頸。
能夠使用單一的,開放的和統一的元數據存儲。
昂貴的數據格式轉換是沒必要要的,所以不會產生開銷。
全部數據均可以當即查詢,而不會延遲ETL。
全部硬件都用於Impala查詢以及MapReduce。
只須要一個機器池來擴展。

 

 

 

 

 

 

 

 https://study.163.com/provider/400000000398149/index.htm?share=2&shareId=400000000398149( 歡迎關注博主主頁,學習python視頻資源,還有大量免費python經典文章)

相關文章
相關標籤/搜索