Hive 各版本關鍵新特性(Key New Feature)介紹

開源世界裏的代碼受社區推進和極客文化的影響,變化一直都很快。這點在 hadoop 生態圈裏表現尤其突出,不過這也與 hadoop 獲得業界的普遍應用以及各類需求推進密不可分(近幾年大數據、雲計算被炒爛的節奏 哈哈~)。生態圈裏各個組件各類 bug、改進、新特性滿天飛,剛看到下面某同窗整理的 hadoop 版本變遷圖以後,感受也有必要整理下 hive 的新特性演進史,以備忘。 javascript

一、Hive 0.8.0

添加 Bitmap Indexes、TIMESTAMP datatype、Plugin Developer Kit、JDBC Driver Improvements 等新特性 html

該版本年代久遠了,就不詳述了~ java

具體請參考:http://blog.cloudera.com/blog/2011/11/coming-attractions-apache-hive-0-8-0/ apache

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12316178 json

二、Hive 0.9.0

1. 支持CREATE OR REPLACE VIEW
2. 增長錯誤提示
3. 支持NOT IN 和 NOT LIKE
4. Ctrl+c將會提交kill命令,kill掉當前運行的query job,而且不會退出hive cli
5. 輸出map數和reduce數
6. 提高"select xx,xx from xxx LIMIT xxx"性能
7. 支持BETWEEN操做
8. PRINTF()函數
9. COALESCE/UNION ALL操做時候對數據類型寬限
10. 增長TIMESTAMP數據類型
11. 增長"INSERT OVERWRITE TABLE X PARTITION (a=b, c=d) IF NOT EXISTS ..."操做,若是分區存在,則不會動.
12. 提高hive任務提交後任務編譯和啓動的性能。
具體請參考:Whats new in Apache Hive 0.9.0
bootstrap

https://cwiki.apache.org/confluence/download/attachments/27362054/WhatsNewInHive090HadoopSummit2012BoF.pdf?version=1&modificationDate=1339872131000
windows

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&version=12317742
api

三、Hive 0.10.0

Cube and Rollup: Hive now has support for creating cubes with rollups. Thanks to Namit! session

List Bucketing: This is an optimization that lets you better handle skew in your tables. Thanks to Gang! app

Better Windows Support: Several Hive 0.10.0 fixes support running Hive natively on Windows. There is no more cygwin dependency. Thanks to Kanna!

Explain’ Adds More Info: Now you can do an explain dependency and the explain plan will contain all the tables and partitions touched upon by the query. Thanks to Sambavi!

Improved Authorization: The metastore can now optionally do authorization checks on the server side instead of on the client, providing you with a better security profile. Thanks to Sushanth!

Faster Simple Queries: Some simple queries that don’t require aggregations, and therefore MapReduce jobs, can now run faster.Thanks to Navis!

Better YARN Support: This release contains additional work aimed at making Hive work well with Hadoop YARN. While not all test cases are passing yet, there has been a lot of good progress made with this release. Thanks to Zhenxiao!

Union Optimization: Hive queries with unions will now result in a lower number of MapReduce jobs under certain conditions. Thanks to Namit!

Undo Your Drop Table: While not really truly ‘undo’, you can now reinstate your table after dropping it. Thanks to Andrew!

Show Create Table: The lets you see how you created your table. Thanks to Feng!

Support for Avro Data: Hive now has built-in support for reading/writing Avro data. Thanks to Jakob!

Skewed Joins: Hive’s support for joins involving skewed data is now improved. Thanks to Namit!

Robust Connection Handling at the Metastore Layer: Connection handling between a metastore client and server  and also between a metastore server and the database layer has been improved. Thanks to Bhushan and Jean!

More Statistics: Its now possible to collect and store scalar-valued statistics for your tables and partitions. This will enable better query planning in upcoming releases. Thanks to Shreepadma!

Better-Looking HWI : HWI now uses a bootstrap javascript library. It looks really slick.

具體請參考: http://zh.hortonworks.com/blog/apache-hive-0-10-0-is-now-available/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12320745&styleName=Text&projectId=12310843

https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup

四、Hive 0.11.0

  • ORCFile.  It’s Optimized.
    The ORC File (Optimized RC File) presents key new features that speed access of data Apache Hive as it adds meta information at the file and block data level so that queries can be more intelligent and use meta data to optimize access.  Further, with the ORC file, only the bytes from the required columns are read from HDFS which minimizes I/O and speeds the query chain.  These are major advances for improved performance in Hive.

  • Improved Data Types
    As Apache Hive marches towards full SQL-compatibility, an update to the decimal data type was made more usable.

  • Analytic Functions
    Hive 0.11 introduces windowing functions for RANK, LEAD/LAG, ROW_NUMBER, FIRST_VALUE, LAST_VALUE and more. It also introduces aggregate OVER functions with PARTITION BY and ORDER BY

  • Joins improved in Hive 0.11
    Both the broadcast join and the SMB join were improved considerably in Hive 0.11.  Both joins work without user hints, so that the Hive optimizer now picks the correct join rather than depending on the user to do so. More broadcast joins are now packed into a single MapReduce job, making star join queries much more efficient.

  • Implement HiveServer2

  • when output hive table to file,users should could have a separator of their own choice

具體請參考:http://zh.hortonworks.com/blog/apache-hive-0-11-stinger-phase-1-delivered/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12323587&styleName=Text&projectId=12310843

五、Hive 0.12.0

Hive12deux

具體請參考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-12/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324312&styleName=Text&projectId=12310843

六、Hive 0.13.0

hivesidebar

具體請參考:http://zh.hortonworks.com/blog/announcing-apache-hive-0-13-completion-stinger-initiative/

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12324986&styleName=Text&projectId=12310843

七、Hive 0.14.0

[HIVE-5317] - Implement insert, update, and delete in Hive with full ACID support

[HIVE-5775] - Introduce Cost Based Optimizer to Hive

[HIVE-5823] - Support for DECIMAL primitive type in AvroSerDe

[HIVE-6455] - Scalable dynamic partitioning and bucketing optimization

[HIVE-6469] - skipTrash option in hive command line

[HIVE-6806] - CREATE TABLE should support STORED AS AVRO

[HIVE-7036] - get_json_object bug when extract list of list with index

[HIVE-7054] - Support ELT UDF in vectorized mode

[HIVE-7068] - Integrate AccumuloStorageHandler

[HIVE-7090] - Support session-level temporary tables in Hive

[HIVE-7158] - Use Tez auto-parallelism in Hive

[HIVE-7203] - Optimize limit 0

[HIVE-7255] - Allow partial partition spec in analyze command

[HIVE-7299] - Enable metadata only optimization on Tez

[HIVE-7341] - Support for Table replication across HCatalog instances

[HIVE-7390] - Make single quote character optional and configurable in BeeLine CSV/TSV output

[HIVE-7416] - provide context information to authorization checkPrivileges api call

[HIVE-7430] - Implement SMB join in tez

[HIVE-7446] - Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

[HIVE-7506] - MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)

[HIVE-7509] - Fast stripe level merging for ORC

[HIVE-7547] - Add ipAddress and userName to ExecHook

[HIVE-7587] - Fetch aggregated stats from MetaStore

[HIVE-7654] - A method to extrapolate columnStats for partitions of a table

[HIVE-7826] - Dynamic partition pruning on Tez

[HIVE-8531] - Fold is not null filter if there are other comparison filter present on same column

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12326450&styleName=Text&projectId=12310843

八、hive 1.0

該版本無新特性

九、hive 1.1

[HIVE-3405] - UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase

[HIVE-7122] - Storage format for create like table

[HIVE-8435] - Add identity project remover optimization

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310843&styleName=Text&version=12329363

十、hive 1.2

[HIVE-7998] - Enhance JDBC Driver to not require class specification

[HIVE-9039] - Support Union Distinct

[HIVE-9188] - BloomFilter support in ORC

[HIVE-9277] - Hybrid Hybrid Grace Hash Join

[HIVE-9302] - Beeline add commands to register local jdbc driver names and jars

[HIVE-9780] - Add another level of explain for RDBMS audience

[HIVE-10038] - Add Calcite's ProjectMergeRule.

[HIVE-10099] - Enable constant folding for Decimal

[HIVE-10591] - Support limited integer type promotion in ORC

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345&styleName=Text&projectId=12310843

十一、Hive 2.0

  • [HIVE-686] - add UDF substring_index

  • [HIVE-3404] - Create quarter UDF

  • [HIVE-7926] - long-lived daemons for query fragment execution, I/O and caching

  • [HIVE-10591] - Support limited integer type promotion in ORC

  • [HIVE-10592] - ORC file dump in JSON format

  • [HIVE-10673] - Dynamically partitioned hash join for Tez

  • [HIVE-10761] - Create codahale-based metrics system for Hive

  • [HIVE-10785] - Support aggregate push down through joins

  • [HIVE-11103] - Add banker's rounding BROUND UDF

  • [HIVE-11461] - Transform flat AND/OR into IN struct clause

  • [HIVE-11488] - Add sessionId and queryId info to HS2 log

  • [HIVE-11593] - Add aes_encrypt and aes_decrypt UDFs

  • [HIVE-11600] - Hive Parser to Support multi col in clause (x,y..) in ((..),..., ())

  • [HIVE-11684] - Implement limit pushdown through outer join in CBO

  • [HIVE-11699] - Support special characters in quoted table names

  • [HIVE-11706] - Implement "show create database"

  • [HIVE-11775] - Implement limit push down through union all in CBO

  • [HIVE-11785] - Support escaping carriage return and new line for LazySimpleSerDe

  • [HIVE-11976] - Extend CBO rules to being able to apply rules only once on a given operator

  • [HIVE-12080] - Support auto type widening (int->bigint & float->double) for Parquet table

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12332641&styleName=Text&projectId=12310843

Refer:

[1] hive0.80, 0.90新特性  http://superlxw1234.iteye.com/blog/1564461

[2] hive 0.10 0.11新增特性綜述  http://blog.csdn.net/lalaguozhe/article/details/11730817

[3] http://hive.apache.org/downloads.html

[4] Hive將來兩年的路線圖  http://www.infoq.com/cn/news/2014/09/hive

(1)支持ACID事務——用戶將能夠插入、更新和刪除現有數據。Hive將由傳統的一次寫入、頻繁讀取的系統發展爲一個支持變化數據分析的系統。
(2)實現亞秒級查詢——用戶能夠將Hive用於像交互式儀表板和探究性分析這樣對響應時間有更高要求的應用場景。
(3)全面支持SQL:2011 Analytics——用戶可使用標準SQL在Hive上部署複雜的報表,並且更快捷、更簡便、更可靠。而基於成本的、功能強大的優化器能夠確保工具生成的查詢和複雜查詢的運行速度。屆時,Hive將在Hadoop上提供企業級SQL用戶所享有的所有表達能力。它將在支持窗口函數、用戶自定義函數、子查詢、Rollup、Cube、標準彙集、內鏈接、外鏈接、半鏈接和交叉鏈接的基礎上,增長對不等鏈接、集合函數(並、交、差)、時間間隔類型等的支持。
Stinger.next計劃用時18個月,將分三個階段交付。事務支持將於2014年末發佈,亞秒級查詢將在2015年上半年推出,而對SQL:2011 Analytics的全面支持則將於2015年末完成。
此外,Hive還將與機器學習框架Spark集成,使用戶能夠經過Hive運行機器學習模型。

相關文章
相關標籤/搜索