開源軟件正在吞噬世界,在將來,沒有一家企業可以脫離它們,也不可能存在一家企業可以脫離開源的開發協做方式,也沒有一家企業會拒絕這種本質上是共贏的局面。本文來自網易數帆旗下網易易數研發團隊,記錄其2020年在Apache Spark上的點滴貢獻。
前言
咱們爲何要擁抱開源
對企業而言,四字真言,有利可圖。這是擺在咱們面前現實且須要正視的目的。對於企業來說,「使用」開源能夠下降整體擁有成本並提高軟件質量,能夠提早獲取最前沿的創新技術,降本提效促發展;「參與」開源能夠提高僱員技術水平,在技術社區創建品牌形象,與技術大拿創建信任,人才培養選拔兩手抓,「構建」開源生態可推廣技術理念、構建行業標準和加深上下游行業合做,技術帶頭共協同。html
根據『紅帽® 2020 年度企業開源現狀報告』,企業對於開源的擁抱程度逐漸加強:java
-
愈來愈多的企業意識到開源的重要性。有 95% 的 IT 領導者認爲,企業開源對於他們的企業基礎架構軟件戰略相當重要。
-
專有軟件的使用正在快速減小。昂貴且不靈活的專有軟件許可,致使高昂的資本支出(CapEx)和供應商鎖定。
![](http://static.javashuo.com/static/loading.gif)
網易與Apache Spark
Apache Spark 目前是網易集團內部主流大數據計算引擎,日承接PB級數據處理,涵蓋離線計算、實時計算和傳統機器學習等方方面面的任務。python
爲了減小Spark在網易內部的維護成本,和促進 Spark 新技術在網易的快速落地,網易易數團隊採起了如下三個策略。git
1. 一體化
咱們的技術開發人員都在不一樣程度上積極參與社區貢獻,加深和社區的合做,和社區融爲一體。github
社區尋求可以持續貢獻的開發者,能夠創建長期良好的合做關係,並相互給與足夠的信任。這將必定程度上使咱們的內部需求積極轉變爲行業標準,社區最新的技術也能夠實時落地。sql
文末所附清單不徹底統計了截至2020年末網易人在Apache Spark 的主要貢獻,約 300 commits。數據庫
2. 插件化
固然,業務傾向使然,相應的技術配套在各企業實體或行業中老是伴隨分歧的。因此,對於Spark源代碼的改造是不可避免的。出於可維護的目的,咱們的策略是將這樣特異性的需求從Spark中獨立出來,造成插件,下降與Spark主幹的耦合性,輕量化的迭代。如下是幾個插件的介紹:express
1. Spark-ranger apache
Spark-ranger 是權限控制插件,爲提供Spark計算引擎 SQL 標準的細粒度權限控制,包括列級別的鑑權、行級別的過濾,及數據匿名等功能。在大數據數倉場景下,Spark SQL做爲一款高性能的查詢引擎在數據安全方面的功能一直是其短板。本項目創立的目的,旨在彌補猛獁產品在數倉管理功能上最後一塊權限漏洞。Spark-ranger 做爲猛獁安全組件的一部分,在公司內部天天須要爲業務方數十萬的Spark任務提供鑑權服務,同時也在公司外部全部的商業局點保證着客戶的數據安全。項目目前已經託管給Apache 基金會,做爲一個子模塊在 https://submarine.apache.org/ 項目中進行維護。json
Spark-ranger開源地址:https://github.com/NetEase/spark-ranger
2. Spark-greenplum
Spark-greenplum 是大數據數倉和PostgreSQL及Greenplum數據庫的性能傳輸工具,提供Apache Spark原生 API 百倍性能的提高。項目創立的目的是爲了提高網易猛獁和網易有數之間數據交換的能力。Spark-greenplum 項目用於網易有數從網易猛獁大數據平臺的取數環節。
Spark-greenplum 開源地址:https://github.com/NetEase/spark-greenplum
3. Spark-alarm
Spark-alarm 是細粒度的 Spark 任務監控工具,能夠對 Spark 任務進行全面的監控,已經自定義關鍵指標的監控,並提供豐富的報警手段,如網易哨兵,郵件和EasyOps等。項目的目的是有效的保障各種業務KPI/SLA任務的安全運行。spark-alarm 是一個任務級別的SDK,目前提供網易內部各業務方,埋點在各自的關鍵任務中。
Spark-alarm開源地址:https://github.com/NetEase/spark-alarm
3. 生態化
如前面提到的構建開源生態可推廣技術理念、構建行業標準和加深上下游行業合做,起到技術帶頭共協同的做用。
在大數據領域,咱們最初圍繞Google的三篇論文打造了Apache Hadoop生態,而後咱們有圍繞Hadoop生態構建了活躍的Apache Spark生態,如今又有不一樣層面的產品,如數據湖等圍繞該生態構建實現真正的批流一體靜計算,同時和CNCF的Kubernetes社區又能夠交叉融合實現大數據與雲計算的深度融合。咱們也基於該生態之上,從網易及網易合做夥伴的業態出發,打造了Kyuubi生態。
Kyuubi是高性能大數據JDBC通用服務引擎。在大數據領域,Kyuubi以靈活的架構和統一的SQL API去適配不一樣的計算引擎以追求極致的計算性能,適配不一樣的資源調度器以適應存算耦合分離的自由切換,適配不一樣的計算模型以實現批流一體架構, 適配不一樣的業務場景以實現一站式的大數據應用開發。目標是讓用戶能像處理普通數據同樣處理大數據。
第1、通用易用的數據訪問方式。Kyuubi依託標準化的JDBC接口提供大數據場景下便捷易用的數據訪問訪問方式,終端用戶無需對底層大數據平臺(計算引擎、存儲服務、元數據管理等)感知便可專一開發自身業務系統及挖掘數據價值。
第2、高性能的數據查詢能力。Kyuubi依託Apache Spark及Flink等計算引擎提供高性能的數據查詢能力,引擎自身能力每一次提高均可以幫助 Kyuubi服務的性能產生質的飛躍,在此基礎之上,Kyuubi 同時提供數據緩存、查詢動態優化等能力進一步提高性能。一方面,對於訪問頻率高的查詢經過設置緩存提高查詢效率;另外一方面,根據用戶訪問數據量的規模動態優化查詢計劃,在支持海量結果流式返回的同時保證性能優化。
第3、完備的企業級特性支持。依託 Kyuubi 自身架構的特色,提供認證、鑑權服務,保障數據安全性;提供健壯的高可用服務,保障服務的可用性;提供多租戶資源資源隔離的能力,提供端到端的計算資源及數據安全隔離;提供兩級彈性資源管理,在有效提高資源利用率的基礎上合理控制成本,而且有效的覆蓋交互式、批處理和點查、全表Scan等各類場景的性能及響應要求。
第四,豐富的生態支持與構建。一個優秀的開源產品離不開優秀的開源生態支持。Kyuubi 在擁抱Spark等頂級開源生態的同時,一方面有效的利用這些項目自己生態的開放性,能夠快速使得Kyuubi對其既有生態及新特性新生態的拓展,如雲原生支持、數據湖(Data Lake/Lake House)的支持;另外一方面,Kyuubi也積極構建和完善本身的生態,彌補各個環節的空缺,如 https://github.com/netease/spark-ranger項目可完善大數據鏈路中權限控制短板,https://github.com/netease/spark-greenplum項目可解決Spark與傳統數據庫PostgreSQL和MPP數據庫Greenplum數據交換的性能問題等等。
Kyuubi開源地址:https://github.com/netease/kyuubi
總結
2020年,不平凡的一年。來自大天然的威脅,讓咱們深入地認識到全人類開放合做的重要性。
一個開源社區的本質是開發者。擁抱開源,構建開源生態,符合網易的使命願景:網聚人的力量,以科技創新締造美好生活
參與開源,固然除了上面所提到的符合企業自身利益,同時也是由於熱愛:爲熱愛全心投入。
附:截至2020年末網易人在Apache Spark 的主要貢獻
* ae1d05927a [SPARK-33892][SQL] Display char/varchar in DESC and SHOW CREATE TABLE * 2287f56a3e (origin/master, origin/HEAD, master) [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns * a3dd8dacee [SPARK-33877][SQL] SQL reference documents for INSERT w/ a column list * 6da5cdf1db [SPARK-33876][SQL] Add length-check for reading char/varchar from tables w/ a external location * f5fd10b1bc (SparkSPARK-33877) [SPARK-33834][SQL] Verify ALTER TABLE CHANGE COLUMN with Char and Varchar * dd44ba5460 [SPARK-32976][SQL][FOLLOWUP] SET and RESTORE hive.exec.dynamic.partition.mode for HiveSQLInsertTestSuite to avoid flakiness * c17c76dd16 [SPARK-33599][SQL][FOLLOWUP] FIX Github Action with unidoc * 728a1298af [SPARK-33806][SQL] limit partition num to 1 when distributing by foldable expressions * 205d8e40bc [SPARK-32991][SQL] [FOLLOWUP] Reset command relies on session initials first * 4d47ac4b4b [SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness * 31e0baca30 [SPARK-33740][SQL] hadoop configs in hive-site.xml can overrides pre-existing hadoop ones * c88eddac3b [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR types * da72b87374 [SPARK-33641][SQL] Invalidate new char/varchar types in public APIs that produce incorrect results * 2da72593c1 [SPARK-32976][SQL] Support column list in INSERT statement * cdd8e51742 [SPARK-33419][SQL] Unexpected behavior when using SET commands before a query in SparkSession.sql * 4335af075a [MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only * 036c11b0d4 [SPARK-33397][YARN][DOC] Fix generating md to html for available-patterns-for-shs-custom-executor-log-url * 82d500a05c [SPARK-33193][SQL][TEST] Hive ThriftServer JDBC Database MetaData API Behavior Auditing * e21bb710e5 [SPARK-32991][SQL] Use conf in shared state as the original configuraion for RESET * dcb0820433 [SPARK-32785][SQL][DOCS][FOLLOWUP] Update migaration guide for incomplete interval literals * 2507301705 [SPARK-33159][SQL] Use hive-service-rpc as dependency instead of inlining the generated code * 17d309dfac [SPARK-32963][SQL] empty string should be consistent for schema name in SparkGetSchemasOperation * e2a740147c [SPARK-32874][SQL][FOLLOWUP][TEST-HIVE1.2][TEST-HADOOP2.7] Fix spark-master-test-sbt-hadoop-2.7-hive-1.2 * 9e9d4b6994 [SPARK-32905][CORE][YARN] ApplicationMaster fails to receive UpdateDelegationTokens message * 316242b768 [SPARK-32874][SQL][TEST] Enhance result set meta data check for execute statement operation with thrift server * 5669b212ec [SPARK-32840][SQL] Invalid interval value can happen to be just adhesive with the unit * 9ab8a2c36d [SPARK-32826][SQL] Set the right column size for the null type in SparkGetColumnsOperation * de44e9cfa0 [SPARK-32785][SQL] Interval with dangling parts should not results null * 1fba286407 [SPARK-32781][SQL] Non-ASCII characters are mistakenly omitted in the middle of intervals * 6dacba7fa0 [SPARK-32733][SQL] Add extended information - arguments/examples/since/notes of expressions to the remarks field of GetFunctionsOperation * 0626901bcb [SPARK-32729][SQL][DOCS] Add missing since version for math functions * f14f3742e0 [SPARK-32696][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] Get columns operation should handle interval column properly * 1f3bb51757 [SPARK-32683][DOCS][SQL] Fix doc error and add migration guide for datetime pattern F * c26a97637f Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv… * 1b6f482adb [SPARK-32492][SQL][FOLLOWUP][TEST-MAVEN] Fix jenkins maven jobs * 7f5326c082 [SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZE /DECIMAL_DIGITS/NUM_PREC_RADIX/ORDINAL_POSITION for thriftserver client tools * 3deb59d5c2 [SPARK-31709][SQL] Proper base path for database/table location when it is a relative path * f4800406a4 [SPARK-32406][SQL][FOLLOWUP] Make RESET fail against static and core configs * 510a1656e6 [SPARK-32412][SQL] Unify error handling for spark thrift server operations * d315ebf3a7 [SPARK-32424][SQL] Fix silent data change for timestamp parsing if overflow happens * d3596c04b0 [SPARK-32406][SQL] Make RESET syntax support single configuration reset * b151194299 [SPARK-32392][SQL] Reduce duplicate error log for executing sql statement operation in thrift server * 29b7eaa438 [MINOR][SQL] Fix warning message for ThriftCLIService.GetCrossReference and GetPrimaryKeys * efa70b8755 [SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation * bdeb626c5a [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE * 4609f1fdab [SPARK-32207][SQL] Support 'F'-suffixed Float Literals * 59a70879c0 [SPARK-32145][SQL][TEST-HIVE1.2][TEST-HADOOP2.7] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message * 9f8e15bb2e [SPARK-32034][SQL] Port HIVE-14817: Shutdown the SessionManager timeoutChecker thread properly upon shutdown * 93529a8536 [SPARK-31957][SQL] Cleanup hive scratch dir for the developer api startWithContext * abc8ccc37b [SPARK-31926][SQL][TESTS][FOLLOWUP][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber * a0187cd6b5 [SPARK-31926][SQL][TEST-HIVE1.2][TEST-MAVEN] Fix concurrency issue for ThriftCLIService to getPortNumber * 22dda6e18e [SPARK-31939][SQL][TEST-JAVA11] Fix Parsing day of year when year field pattern is missing * 6a424b93e5 [SPARK-31830][SQL] Consistent error handling for datetime formatting and parsing functions * 02f32cfae4 [SPARK-31926][SQL][TEST-HIVE1.2] Fix concurrency issue for ThriftCLIService to getPortNumber * fc6af9d900 [SPARK-31867][SQL][FOLLOWUP] Check result differences for datetime formatting * 9d5b5d0a58 [SPARK-31879][SQL][TEST-JAVA11] Make week-based pattern invalid for formatting too * afcc14c6d2 [SPARK-31896][SQL] Handle am-pm timestamp parsing when hour is missing * afe95bd9ad [SPARK-31892][SQL] Disable week-based date filed for parsing * c59f51bcc2 [SPARK-31879][SQL] Using GB as default Locale for datetime formatters * 547c5bf552 [SPARK-31867][SQL] Disable year type datetime patterns which are longer than 10 * fe1da296da [SPARK-31833][SQL][TEST-HIVE1.2] Set HiveThriftServer2 with actual port while configured 0 * 311fe6a880 [SPARK-31835][SQL][TESTS] Add zoneId to codegen related tests in DateExpressionsSuite * 695cb617d4 (t1) [SPARK-31771][SQL] Disable Narrow TextStyle for datetime pattern 'G/M/L/E/u/Q/q' * 0df8dd6073 [SPARK-30352][SQL] DataSourceV2: Add CURRENT_CATALOG function * 7e2ed40d58 [SPARK-31759][DEPLOY] Support configurable max number of rotate logs for spark daemons * 1f29f1ba58 [SPARK-31684][SQL] Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table * 1d66085a93 [SPARK-31289][TEST][TEST-HIVE1.2] Eliminate org.apache.spark.sql.hive.thriftserver.CliSuite flakiness * 503faa24d3 [SPARK-31715][SQL][TEST] Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard * ce714d8189 [SPARK-31678][SQL] Print error stack trace for Spark SQL CLI when error occurs * b31ae7bb0b [SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions * bd6b53cc0b [SPARK-31631][TESTS] Fix test flakiness caused by MiniKdc which throws 'address in use' BindException with retry * 9241f8282f [SPARK-31586][SQL][FOLLOWUP] Restore SQL string for datetime - interval operations * ea525fe8c0 [SPARK-31597][SQL] extracting day from intervals should be interval.days + days in interval.microsecond * 295d866969 [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc * 54996be4d2 [SPARK-31527][SQL][TESTS][FOLLOWUP] Add a benchmark test for datetime add/subtract interval operations * beec8d535f [SPARK-31586][SQL] Replace expression TimeSub(l, r) with TimeAdd(l -r) * 5ba467ca1d [SPARK-31550][SQL][DOCS] Set nondeterministic configurations with general meanings in sql configuration doc * ebc8fa50d0 [SPARK-31527][SQL] date add/subtract interval only allow those day precision in ansi mode * 7959808e96 [SPARK-31564][TESTS] Fix flaky AllExecutionsPageSuite for checking 1970 * f92652d0b5 [SPARK-31528][SQL] Remove millennium, century, decade from trunc/date_trunc fucntions * caf3ab8411 [SPARK-31552][SQL] Fix ClassCastException in ScalaReflection arrayClassFor * 8424f55229 [SPARK-31532][SQL] Builder should not propagate static sql configs to the existing active or default SparkSession * 8dc2c0247b [SPARK-31522][SQL] Hive metastore client initialization related configurations should be static * 3b5792114a [SPARK-31474][SQL][FOLLOWUP] Replace _FUNC_ placeholder with functionname in the note field of expression info * 37d2e037ed [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function * 2c2062ea7c [SPARK-31498][SQL][DOCS] Dump public static sql configurations through doc generation * 1985437110 [SPARK-31474][SQL] Consistency between dayofweek/dow in extract exprsession and dayofweek function * 77cb7cde0d [SPARK-31469][SQL][TESTS][FOLLOWUP] Remove unsupported fields from ExtractBenchmark * 697083c051 [SPARK-31469][SQL] Make extract interval field ANSI compliance * 31b907748d [SPARK-31414][SQL][DOCS][FOLLOWUP] Update default datetime pattern for json/csv APIs documentations * d65f534c5a [SPARK-31414][SQL] Fix performance regression with new TimestampFormatter for json and csv time parsing * a454510917 [SPARK-31392][SQL] Support CalendarInterval to be reflect to CalendarntervalType * 3c94a7c8f5 [SPARK-29311][SQL][FOLLOWUP] Add migration guide for extracting second from datetimes * 1ce584f6b7 [SPARK-31321][SQL] Remove SaveMode check in v2 FileWriteBuilder * f376d24ea1 [SPARK-31280][SQL] Perform propagating empty relation after RewritePredicateSubquery * 5945d46c11 [SPARK-31225][SQL] Override sql method of OuterReference * 8be16907c2 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir * 44bd36ad7b [SPARK-31234][SQL] ResetCommand should reset config to sc.conf only * b024a8a69e [MINOR][DOCS] Fix some links for python api doc * 336621e277 [SPARK-31258][BUILD] Pin the avro version in SBT * f81f11822c [SPARK-31189][R][DOCS][FOLLOWUP] Replace Datetime pattern links in R doc * 88ae6c4481 [SPARK-31189][SQL][DOCS] Fix errors and missing parts for datetime pattern document * 3d695954e5 [SPARK-31150][SQL][FOLLOWUP] handle ' as escape for text * 57fcc49306 [SPARK-31176][SQL] Remove support for 'e'/'c' as datetime pattern charactar * f1d27cdd91 [SPARK-31119][SQL] Add interval value support for extract expression as extract source * 5bc0d76591 [SPARK-31170][SQL] Spark SQL Cli should respect hive-site.xml and spark.sql.warehouse.dir * 0946a9514f [SPARK-31150][SQL] Parsing seconds fraction with variable length for timestamp * fbc9dc7e9d [SPARK-31129][SQL][TESTS] Fix IntervalBenchmark and DateTimeBenchmark * 7b4b29e8d9 [SPARK-31131][SQL] Remove the unnecessary config spark.sql.legacy.timeParser.enabled * 18f2730874 [SPARK-31066][SQL][TEST-HIVE1.2] Disable useless and uncleaned hive SessionState initialization parts * 2b46662bd0 [SPARK-31111][SQL][TESTS] Fix interval output issue in ExtractBenchmark * 3bd6ebff81 [SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces * f45ae7f2c5 [SPARK-31038][SQL] Add checkValue for spark.sql.session.timeZone * 3edab6cc1d [MINOR][CORE] Expose the alias -c flag of --conf for spark-submit * 1fac06c430 Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server" * 1383bd459a [SPARK-30970][K8S][CORE] Fix NPE while resolving k8s master url * 2d2706cb86 [SPARK-30956][SQL][TESTS] Use intercept instead of try-catch to assert failures in IntervalUtilsSuite * a6026c830a [MINOR][BUILD] Fix make-distribution.sh to show usage without 'echo' cmd * 761209c1f2 [SPARK-30919][SQL] Make interval multiply and divide's overflow behavior consistent with other operations * 46019b6e6c [MINOR][DOCS] Fix fabric8 version in documentation * 0353cbf092 [MINOR][DOC] Fix 2 style issues in running-on-kubernetes doc * 58b9ca1e6f [SPARK-30592][SQL][FOLLOWUP] Add some round-trip test cases * 3228d723a4 [SPARK-30603][SQL] Move RESERVED_PROPERTIES from SupportsNamespaces and TableCatalog to CatalogV2Util * 8e280cebf2 [SPARK-30592][SQL] Interval support for csv and json funtions * f2d71f5838 [SPARK-30591][SQL] Remove the nonstandard SET OWNER syntax for namespaces * af705421db [SPARK-30593][SQL] Revert interval ISO/ANSI SQL Standard output since we decide not to follow ANSI and no round trip * 730388b369 [SPARK-30547][SQL][FOLLOWUP] Update since anotation for CalendarInterval class * 0388b7a3ec [SPARK-30568][SQL] Invalidate interval type as a field table schema * 24efa43826 [SPARK-30019][SQL] Add the owner property to v2 table * 4806cc5bd1 [SPARK-30547][SQL] Add unstable annotation to the CalendarInterval class * 17857f9b8b [SPARK-30551][SQL] Disable comparison for interval type * 82f25f5855 [SPARK-30507][SQL] TableCalalog reserved properties shoudn't be changed via options or tblpropeties * bcf07cbf5f [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax * c37312342e [SPARK-30183][SQL] Disallow to specify reserved properties in CREATE/ALTER NAMESPACE syntax * 8c121b0827 [SPARK-30431][SQL] Update SqlBase.g4 to create commentSpec pattern like locationSpec * c49388a484 [SPARK-30214][SQL] A new framework to resolve v2 commands * e04309cb1f [SPARK-30341][SQL] Overflow check for interval arithmetic operations * f0bf2eb006 [SPARK-30356][SQL] Codegen support for the function str_to_map * da65a955ed [SPARK-30266][SQL] Avoid match error and int overflow in ApproximatePercentile and Percentile * 12249fcdc7 [SPARK-30301][SQL] Fix wrong results when datetimes as fields of complex types * d38f816748 [MINOR][SQL][DOC] Fix some format issues in Dataset API Doc * cc7f1eb874 [SPARK-29774][SQL][FOLLOWUP] Add a migration guide for date_add and date_sub * bf7215c510 [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache * d3ec8b1735 [SPARK-30066][SQL] Support columnar execution on interval types * 8f0eb7dc86 [SPARK-29587][SQL] Support SQL Standard type real as float(4) numeric as decimal * 24c4ce1e64 [SPARK-28351][SQL][FOLLOWUP] Remove 'DELETE FROM' from unsupportedHiveNativeCommands * e88d74052b [SPARK-30147][SQL] Trim the string when cast string type to booleans * 35bab33984 [SPARK-30121][BUILD] Fix memory usage in sbt build script * b9cae37750 [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres * 332e252a14 [SPARK-29425][SQL] The ownership of a database should be respected * 65552a81d1 [SPARK-30083][SQL] visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking * 39291cff95 [SPARK-30048][SQL] Enable aggregates with interval type values for RelationalGroupedDataset * 4e073f3c50 [SPARK-30047][SQL] Support interval types in UnsafeRow * 4fd585d2c5 [SPARK-30008][SQL] The dataType of collect_list/collect_set aggs should be ArrayType(_, false) * ed0c33fdd4 [SPARK-30026][SQL] Whitespaces can be identified as delimiters in interval string * 8b0121bea8 [MINOR][DOC] Fix the CalendarIntervalType description * de21f28f8a [SPARK-29986][SQL] casting string to date/timestamp/interval should trim all whitespaces * 5cf475d288 [SPARK-30000][SQL] Trim the string when cast string type to decimals * 2dd6807e42 [SPARK-28023][SQL] Add trim logic in UTF8String's toInt/toLong to make it consistent with other string-numeric casting * d555f8fcc9 [SPARK-29961][SQL][FOLLOWUP] Remove useless test for VectorUDT * 7a70670345 [SPARK-29961][SQL] Implement builtin function - typeof * 79ed4ae2db [SPARK-29926][SQL] Fix weird interval string whose value is only a dangling decimal point * ea010a2bc2 [SPARK-29873][SQL][TEST][FOLLOWUP] set operations should not escape when regen golden file with --SET --import both specified * ae6b711b26 [SPARK-29941][SQL] Add ansi type aliases for char and decimal * 50f6d930da [SPARK-29870][SQL] Unify the logic of multi-units interval string to CalendarInterval * 5cebe587c7 [SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for interval type * 0c68578fa9 [SPARK-29888][SQL] new interval string parser shall handle numeric with only fractional part * 15a72f3755 [SPARK-29287][CORE] Add LaunchedExecutor message to tell driver which executor is ready for making offers * f926809a1f [SPARK-29390][SQL] Add the justify_days(), justify_hours() and justif_interval() functions * d99398e9f5 [SPARK-29855][SQL] typed literals with negative sign with proper result or exception * d06a9cc4bd [SPARK-29822][SQL] Fix cast error when there are white spaces between signs and values * e026412d9c [SPARK-29679][SQL] Make interval type comparable and orderable * e7f7990bc3 [SPARK-29688][SQL] Support average for interval type values * 0a03839366 [SPARK-29787][SQL] Move methods add/subtract/negate from CalendarInterval to IntervalUtils * 9562b26914 [SPARK-29757][SQL] Move calendar interval constants together * 3437862975 [SPARK-29387][SQL][FOLLOWUP] Fix issues of the multiply and divide for intervals * 4615769736 [SPARK-29603][YARN] Support application priority for YARN priority scheduling * 44b8fbcc58 [SPARK-29663][SQL] Support sum with interval type values * 8cf76f8d61 [SPARK-29285][SHUFFLE] Temporary shuffle files should be able to handle disk failures * 5ba17d09ac [SPARK-29722][SQL] Non reversed keywords should be able to be used in high order functions * dc987f0c8b [SPARK-29653][SQL] Fix MICROS_PER_MONTH in IntervalUtils * 8e667db5d8 [SPARK-29629][SQL] Support typed integer literal expression * 9a46702791 [SPARK-29554][SQL] Add `version` SQL function * 0cf4f07c66 [SPARK-29545][SQL] Add support for bit_xor aggregate function * 5b4d9170ed [SPARK-27879][SQL] Add support for bit_and and bit_or aggregates * ef4c298cc9 [SPARK-29405][SQL] Alter table / Insert statements should not change a table's ownership * 4b902d3b45 [SPARK-29491][SQL] Add bit_count function support * 6d4cc7b855 [SPARK-27880][SQL] Add bool_and for every and bool_or for any as function aliases * 02c5b4f763 [SPARK-28947][K8S] Status logging not happens at an interval for liveness * f4c73b7c68 [SPARK-27301][DSTREAM] Shorten the FileSystem cached life cycle to the cleanup method inner scope * ac9c0536bc [SPARK-26794][SQL] SparkSession enableHiveSupport does not point to hive but in-memory while the SparkContext exists * f8346d2fc0 [SPARK-25174][YARN] Limit the size of diagnostic message for am to unregister itself from rm * 4a2b15f0af [SPARK-24241][SUBMIT] Do not fail fast when dynamic resource allocation enabled with 0 executor * a7755fd8ce [SPARK-23639][SQL] Obtain token before init metastore client in SparkSQL CLI * 189f56f3dc [SPARK-23383][BUILD][MINOR] Make a distribution should exit with usage while detecting wrong options * eefec93d19 [SPARK-23295][BUILD][MINOR] Exclude Waring message when generating versions in make-distribution.sh * dd52681bf5 [SPARK-23253][CORE][SHUFFLE] Only write shuffle temporary index file when there is not an existing one * 793841c6b8 [SPARK-21771][SQL] remove useless hive client in SparkSQLEnv * 9fa703e893 [SPARK-22950][SQL] Handle ChildFirstURLClassLoader's parent * 28ab5bf597 [SPARK-22487][SQL][HIVE] Remove the unused HIVE_EXECUTION_VERSION property * c755b0d910 [SPARK-22463][YARN][SQL][HIVE] add hadoop/hive/hbase/etc configuration files in SPARK_CONF_DIR to distribute archive * ee571d79e5 [SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default * 99e32f8ba5 [SPARK-22224][SQL] Override toString of KeyValue/Relational-GroupedDataset * 581200af71 [SPARK-21428][SQL][FOLLOWUP] CliSessionState should point to the actual metastore not a dummy one * b83b502c41 [SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState * 2387f1e316 [SPARK-21675][WEBUI] Add a navigation bar at the bottom of the Details for Stage Page * e9d268f63e [SPARK-20096][SPARK SUBMIT][MINOR] Expose the right queue name not null if set by --conf or configure file * 7363dde634 [SPARK-19626][YARN] Using the correct config to set credentials update time * e33053ee00 [SPARK-11583] [CORE] MapStatus Using RoaringBitmap More Properly * 7466031632 [SPARK-32106][SQL] Implement script transform in sql/core * 0603913c66 [SPARK-33593][SQL] Vector reader got incorrect data with binary partition value * 25c6cc25f7 [SPARK-26341][WEBUI] Expose executor memory metrics at the stage level, in the Stages tab * 5f9a7fea06 [SPARK-33428][SQL] Conv UDF use BigInt to avoid Long value overflow * d7f4b2ad50 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+ * 47326ac1c6 [SPARK-28704][SQL][TEST] Add back Skiped HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+ * dd32f45d20 [SPARK-31069][CORE] Avoid repeat compute `chunksBeingTransferred` cause hight cpu cost in external shuffle service when `maxChunksBeingTransferred` use default value * 34f5e7ce77 [SPARK-33302][SQL] Push down filters through Expand * 0c943cd2fb [SPARK-33248][SQL] Add a configuration to control the legacy behavior of whether need to pad null value when value size less then schema size * e43cd8ccef [SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive * a1629b4a57 [SPARK-32852][SQL] spark.sql.hive.metastore.jars support HDFS location * f8277d3aa3 [SPARK-32069][CORE][SQL] Improve error message on reading unexpected directory * ddc7012b3d [SPARK-32243][SQL] HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number error * 0b5a379c1f [SPARK-33023][CORE] Judge path of Windows need add condition `Utils.isWindows` * c336ddfdb8 [SPARK-32867][SQL] When explain, HiveTableRelation show limited message * 5e6173ebef [SPARK-31670][SQL] Trim unnecessary Struct field alias in Aggregate/GroupingSets * 55ce49ed28 [SPARK-32400][SQL][TEST][FOLLOWUP][TEST-MAVEN] Fix resource loading error in HiveScripTransformationSuite * 9808c15eec [SPARK-32608][SQL][FOLLOW-UP][TEST-HADOOP2.7][TEST-HIVE1.2] Script Transform ROW FORMAT DELIMIT value should format value * c75a82794f [SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column * 6dae11d034 [SPARK-32607][SQL] Script Transformation ROW FORMAT DELIMITED `TOK_TABLEROWFORMATLINES` only support '\n' * 03e2de99ab [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value * 643cd876e4 [SPARK-32352][SQL] Partially push down support data filter if it mixed in partition filters * 4cf8c1d07d [SPARK-32400][SQL] Improve test coverage of HiveScriptTransformationExec * d251443a02 [SPARK-32403][SQL] Refactor current ScriptTransformationExec * 5521afbd22 [SPARK-32220][SQL][FOLLOW-UP] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result * 6d499647b3 [SPARK-32105][SQL] Refactor current ScriptTransformationExec code * 09789ff725 [SPARK-31226][CORE][TESTS] SizeBasedCoalesce logic will lose partition * 560fe1f54c [SPARK-32220][SQL] SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result * 15fb5d7677 [SPARK-28169][SQL] Convert scan predicate condition to CNF * 0d9faf602e [SPARK-31655][BUILD] Upgrade snappy-java to 1.1.7.5 * 6bc8d84130 [SPARK-29492][SQL] Reset HiveSession's SessionState conf's ClassLoader when sync mode * 246c398d59 [SPARK-30435][DOC] Update doc of Supported Hive Features * 3eade744f8 [SPARK-29800][SQL] Rewrite non-correlated EXISTS subquery use ScalaSubquery to optimize perf * da27f91560 [SPARK-29957][TEST] Reset MiniKDC's default enctypes to fit jdk8/jdk11 * 6146dc4562 [SPARK-29874][SQL] Optimize Dataset.isEmpty() * eb79af8dae [SPARK-29145][SQL][FOLLOW-UP] Move tests from `SubquerySuite` to `subquery/in-subquery/in-joins.sql` * e524a3a223 [SPARK-29742][BUILD] Update checkstyle plugin's check dir scope * d6e33dc377 [SPARK-29599][WEBUI] Support pagination for session table in JDBC/ODBC Tab * 67cf0433ee [SPARK-29145][SQL] Support sub-queries in join conditions * 484f93e255 [SPARK-29530][SQL] Make SQLConf in SQL parse process thread safe * 9a3dccae72 [SPARK-29379][SQL] SHOW FUNCTIONS show '!=', '<>' , 'between', 'case' * ef81525a1a [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2 * 178a1f3558 [SPARK-29305][BUILD] Update LICENSE and NOTICE for Hadoop 3.2 * 0cf2f48dfe [SPARK-29022][SQL] Fix SparkSQLCLI can not add jars by AddJarCommand * 1d4b2f010b [SPARK-29247][SQL] Redact sensitive information in when construct HiveClientHive.state * cc852d4eec [SPARK-29015][SQL][TEST-HADOOP3.2] Reset class loader after initializing SessionState for built-in Hive 2.3 * d22768a6be [SPARK-29036][SQL] SparkThriftServer cancel job after execute() thread interrupted * fe4bee8fd8 [SPARK-29162][SQL] Simplify NOT(IsNull(x)) and NOT(IsNotNull(x)) * 54d3f6e7ec [SPARK-28982][SQL] Implementation Spark's own GetTypeInfoOperation * 9f478a6832 [SPARK-28901][SQL] SparkThriftServer's Cancel SQL Operation show it in JDBC Tab UI * 036fd3903f [SPARK-27637][SHUFFLE][FOLLOW-UP] For nettyBlockTransferService, if IOException occurred while create client, check whether relative executor is alive before retry #24533 * e853f068f6 [SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs * 1dd63dccd8 [SPARK-33860][SQL] Make CatalystTypeConverters.convertToCatalyst match special Array value * bc46d273e0 [SPARK-33840][DOCS] Add spark.sql.files.minPartitionNum to performence tuning doc * 839d6899ad [SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field * 5bab27e00b [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver
做者:網易易數Spark開發團隊