pandas和SQL數據分析實戰node
https://study.163.com/course/courseMain.htm?courseId=1006383008&share=2&shareId=400000000398149python
http://impala.apache.org/apache
Apache Impala is the open source, native analytic database
for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.安全
Impala provides low latency低延遲 and high concurrency 高併發for BI/analytic queries on Hadoop (not delivered by batch frameworks such as Apache Hive). Impala also scales linearly, even in multitenant environments.網絡
在Hadoop上執行BI風格的查詢
Impala爲Hadoop上的BI /分析查詢提供了低延遲和高併發性(不是由Apache Hive等批處理框架提供的)。
即便在多租戶環境中,Impala也能線性擴展。
Unify Your Infrastructure
Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment—no redundant infrastructure or data conversion/duplication.架構
統一基礎設施 與Hadoop部署同樣,利用相同的文件和數據格式以及元數據,安全性和資源管理框架 - 無需冗餘基礎架構或數據轉換/複製。
For Apache Hive users, Impala utilizes the same metadata and ODBC driver. Like Hive, Impala supports SQL, so you don't have to worry about re-inventing the implementation wheel.併發
快速實施 對於Apache Hive用戶,Impala使用相同的元數據和ODBC驅動程序。與Hive同樣,Impala支持SQL,所以您沒必要擔憂從新實現輪子。
Impala is integrated with native Hadoop security and Kerberos for authentication, and via the Sentry module, you can ensure that the right users and applications are authorized for the right data.app
依靠企業級安全 Impala與本地Hadoop安全和Kerberos進行身份驗證集成,而且經過Sentry模塊,您能夠確保正確的用戶和應用程序得到正確數據的受權。
Impala is open source (Apache License).框架
保持自由鎖定
Impala是開源的(Apache許可證)。
Expand the Hadoop User-verse
With Impala, more users, whether using SQL queries or BI applications, can interact with more data through a single repository and metadata store from source through analysis.dom
展開Hadoop User-verse
經過Impala,不管使用SQL查詢仍是BI應用程序,更多用戶均可以經過單個存儲庫和元數據存儲從源代碼經過分析與更多數據進行交互。
Impala raises the bar for SQL query performance on Apache Hadoop while retaining a familiar user experience. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time. Furthermore, Impala uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive, providing a familiar and unified platform for batch-oriented or real-time queries. (For that reason, Hive users can utilize Impala with little setup overhead.)
概觀
Impala提升了Apache Hadoop上SQL查詢性能的標準,同時保留了熟悉的用戶體驗。使用Impala,您能夠實時查詢數據,
不管是存儲在HDFS仍是Apache HBase中 - 包括SELECT,JOIN和聚合函數。
此外,Impala與Apache Hive同樣使用相同的元數據,SQL語法(Hive SQL),
ODBC驅動程序和用戶界面(Hue Beeswax),
爲面向批處理或實時查詢提供熟悉且統一的平臺。
(出於這個緣由,Hive用戶能夠在安裝開銷很小的狀況下使用Impala。)
To avoid latency, Impala circumvents MapReduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs. The result is order-of-magnitude faster performance than Hive, depending on the type of query and configuration.
設計
爲了不延遲,Impala規避MapReduce經過專用分佈式查詢引擎直接訪問數據,該引擎很是相似於商業並行RDBMS中的數據。
結果是性能比Hive更高,這取決於查詢和配置的類型。
There are many advantages to this approach over alternative approaches for querying Hadoop data, including::
這種方法與查詢Hadoop數據的其餘方法相比有許多優勢,包括::
因爲數據節點上的本地處理,避免了網絡瓶頸。
能夠使用單一的,開放的和統一的元數據存儲。
昂貴的數據格式轉換是沒必要要的,所以不會產生開銷。
全部數據均可以當即查詢,而不會延遲ETL。
全部硬件都用於Impala查詢以及MapReduce。
只須要一個機器池來擴展。
https://study.163.com/provider/400000000398149/index.htm?share=2&shareId=400000000398149( 歡迎關注博主主頁,學習python視頻資源,還有大量免費python經典文章)