Hadoop霸主地位不保?看看大數據領域的六年鉅變

 大叔的原創專欄 << 點擊python

來源 | https://blog.marouni.fr/bidata-tre nds-analysis/ 
做者 | Abbass Marouni  

I’ve been a loyal follower of Data Eng Weekly newsletter (formerly Hadoop Weekly) for the past 6 years, the newsletter is a great source for everything related to Big data and data engineering in general with a wide selection of technical articles along with product announcements and industry news.web

For this year’s holidays side project I decided to analyze Data Eng’s archives, that go back to January 2013, to try to analyze Big data trends and changes over the past 6 years.微信

So I crawled and cleaned over 290 weekly issues (well python did !), I kept articles’ snippets from the technical, news and releases sections only. Next, I ran some basic natural language processing followed by some basic filtering to produce keywords mentions and all of the plots that follow.


上面的大段英文,簡單地說,本文的數據來源是Data Eng Weekly,它是與大數據和數據工程相關內容的重要來源,涵蓋了很是普遍的技術文章、產品公告和行業新聞。做者整理了290期內容,保留了與技術、新聞和發佈公告相關的文章片斷。app

 

下面的英文很簡單感受大家都能看得懂...因此就不翻譯了...拜拜...dom

 

Major trends over the last seven years編輯器



             


Hadoop vs. Sparkide


             

 

Observations : We see the steady decline of Hadoop since 2013 and the moment Spark took over Hadoop (especially MapReduce).oop

Hadoop vs. Kafka性能

             

 

Observations : The rise of Kafka as the main building block in all Big data stacks.大數據


Hadoop vs. Kubernetes



     

 

Observations : An interesting observation is the rise of Kubernestes, even though the Data Eng Weekly is not a Devops newsletter, is a witness to the overall hype around Kubernetes in all domains starting from beginning of 2017.

Yearly top keywords

Here I’m simply plotting the top 10 keywords by total number of mentions in a give year.



2013 : Hadoop’s golden year !


     

Observations : All of the original Hadoop projects are here : HDFS, YARN, MR, PIG, … With the 2 major distributions CDH & HDP and nothing else !


2014 : The rise of Spark !


     

Observations : Hadoop in general continued its dominance but Spark made its debut with its first version this year was the hottest topic of 2014, e also got the first glimpse of Kafka !



2015 : Here comes Kafka !


     

Observations : Spark takes ever the first spot from Hadoop and Kafka making it to the top 3. Most of the old regime projects (HDFS, YARN, MR, PIG, …) didn’t make to the top 10.


2016 : Streaming is on fire !


     

Observations : 2016 was the streaming year, Kafka took the second place from Hadoop with Spark (streaming) continuing its dominance.



2017 : Stream everything !


     

Observations : The same lineup as 2016 with some Flink thrown in.



2018 : Back to basics ! 


     

Observations : Kubernetes makes its debut and we’re back to basics trying to figure out the how to manages (K8S), schedule (airflow) and run (Spark, Kafka, Storage, …) our streams.



2019 : …   


       

Observations : It’s still too early to make any conclusions about 2019, but it looks like the year where K8s & co. go prod. mainstream !



>>   想學大數據?點擊找大叔! <<


智能人工推薦:
查詢太慢?看看ES是如何把索引的性能壓榨到極致的!
ES是什麼?看完這篇就不要再問這種低級問題了!
選方向?大數據的職位你瞭解多少
戲說數據中臺 — 大佬玩概念,小弟寫接口
>>  點擊查看更多

以爲有價值請關注  




本文分享自微信公衆號 - 老懞大數據(simon_bigdata)。
若有侵權,請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」,歡迎正在閱讀的你也加入,一塊兒分享。

相關文章
相關標籤/搜索