大數據是什麼?

造數 - 跟新一代智能雲爬蟲一塊兒探索大數據的樂趣大數據 微信

Big Data

聽說,詞源出自Alvin Toffler,上世紀70年代的做品《第三次浪潮》。
逝者 | 阿爾文·托夫勒:如何化解將來的衝擊網絡

雖然大數據是一個泛泛的概念詞,可是關於大數據,關於大數據處理分析的話題近來持續升溫,如今基本成了新一輪工業革命級別的話題。架構

大數據是什麼,做爲數據採集團隊 ,咱們很長的時間裏一直也在思考,什麼是大數據,大數據的前景和價值在哪裏。這篇文章裏,我會跟你們一塊兒分享個人見解以及各類有趣的內容和資源,它們關於:app

  1. 什麼是大數據
  2. 大數據的實踐
  3. 大數據的應用場景

最近都在說裁人,若是想知道互聯網裁人潮對就業薪資是否是真的產生了持久的負面影響,能夠用咱們的工具,幫你定時天天採集幾回生成列表看一看。框架

(一)什麼是大數據

先聽聽行家的說法:大數據就是多,就是多。原來的設備存不下、算不動。
————啪菠蘿·畢加索electron

大數據,不是隨機樣本,而是全部數據;不是精確性,而是混雜性;不是因果關係,而是相關關係。 —————Schönbergeride

移步ted:Kenneth Cukier: Big data is better data工具

America's favorite pie is?Audience: Apple. Kenneth Cukier: Apple. Of course it is. How do we know it? Because of data. You look at supermarket sales. You look at supermarket sales of 30-centimeter pies that are frozen, and apple wins, no contest. The majority of the sales are apple. But then supermarkets started selling smaller, 11-centimeter pies, and suddenly, apple fell to fourth or fifth place. Why? What happened? Okay, think about it. When you buy a 30-centimeter pie, the whole family has to agree, and apple is everyone's second favorite. (Laughter) But when you buy an individual 11-centimeter pie, you can buy the one that you want. You can get your first choice. You have more data. You can see something that you couldn't see when you only had smaller amounts of it.oop

曾經人們覺得最愛吃的派都是蘋果派,不過當你有了更細緻的數據,你會發現,蘋果派受歡迎實際上是一種妥協的結果:蘋果派是每一個人第二喜歡的口味。大數據

拿到小尺寸派的數據之後你更發現,其實蘋果派只能排到第四,第五位的樣子了。

你有了更多數據,你就能看到以前你看不到的信息。

大數據最核心的價值是什麼? - 商業 - 知乎 推薦@Han Hsiao這篇內容的結構十分清晰,對大數據的正面意義提出了很是清晰地探討。

大數據聽着很牛,實際上也很牛嗎? - 人工智能 - 知乎 這裏 @陳萌萌說的也特別好,懷疑她是否是真的是一個ai。

大數據最核心的價值是什麼? - 商業 - 知乎,依然是這個問題, @劉飛的文章。

大數據是大數據的採集

大數據行業,自己是依託於數據源存在的服務性行業。
大數據最根本之處在於信息收集方式出現了重大變化與革新。大數據的出現與大量信息直接在網絡呈現關係很是緊密。微博、天貓、淘寶、微信等等都直接產生了大量包括定位、消息記錄、消費記錄、評價、閱讀等等殊爲龐大的信息,能夠說互聯網企業都天然的帶有數據企業的標籤。不過若是咱們從數據的源頭看的更仔細一些,仍是會發現,其實不少數據依然是有巨大的採集與歸類的需求。

Joel Selanikio:Transcript of "The big-data revolution in healthcare"

There's a concept that people talk about nowadays called "big data." And what they're talking about is all of the information that we're generating through our interaction with and over the Internet, everything from Facebook and Twitter to music downloads, movies, streaming, all this kind of stuff, the live streaming of TED. And the folks who work with big data, for them, they talk about that their biggest problem is we have so much information. The biggest problem is: how do we organize all that information?

如今人人都說大數據,但其實你們說的是 facebook,twitter,streaming 等等站點上天天產生的信息,作大數據的人呢,會以爲咱們有的數據量實在太大了。

(組織信息仍然是最難的問題)

I can tell you that, working in global health, that is not our biggest problem. Because for us, even though the light is better on the Internet, the data that would help us solve the problems we're trying to solve is not actually present on the Internet. So we don't know, for example, how many people right now are being affected by disasters or by conflict situations. We don't know for, really, basically, any of the clinicsin the developing world, which ones have medicines and which ones don't. We have no idea of what the supply chain is for those clinics. We don't know -- and this is really amazing to me -- we don't know how many children were born -- or how many children there are -- in Bolivia or Botswana or Bhutan. We don't know how many kids died last week in any of those countries. We don't know the needs of the elderly, the mentally ill. For all of these different critically important problems or critically important areas that we want to solve problems in, we basically know nothing at all.

許多有效的數據還徹底不在網絡上,要依靠原始的方法來收集。數據方面還有不少基本層面的問題在很是多的領域很是明顯。

有哪些「神奇」的數據獲取方式? - Liu Cao 的回答 - 知乎 看到這裏推薦一個 @Liu Cao的回答。
嚴瀾(lanceyan)的博客 - 技術分享 框架交流 大數據處理 架構搭建 機器人
強烈推薦:如何用形象的比喻描述大數據的技術生態?Hadoop、Hive、Spark 之間是什麼關係?其中 @Xiaoyu Ma

(二)大數據的實踐工具

看這裏:大數據分析通常用什麼工具分析? - JavaScript - 知乎
最近看到個例子,說pokemon go 帶給玩家運動量上的變化:

一、應用中的數據分析示例:

六個月之後,大部分pokemon go 的玩家的運動量逐漸和 non-player基本一致了。看來確實是一個能用至關效果的遊戲。

二、交通情況大數據分析示例:

Susan Etlinger: What do we do with all this big data?

Now, there's a group of data scientists out of the University of Illinois-Chicago, and they're called the Health Media Collaboratory, and they've been working with the Centers for Disease Control to better understand how people talk about quitting smoking, how they talk about electronic cigarettes, and what they can do collectively to help them quit. The interesting thing is, if you want to understand how people talk about smoking, first you have to understand what they mean when they say "smoking." And on Twitter, there are four main categories: number one, smoking cigarettes; number two, smoking marijuana;number three, smoking ribs; and number four, smoking hot women.

這裏很是有趣

##(三)大數據的應用場景

先貼兩個新聞觀察:

京津冀大數據產業發展示狀 | 報告 | 數據觀 | 中國大數據產業觀察_大數據門戶
數據觀 | 中國大數據產業觀察_大數據門戶

現在,在政策上,國家戰略層面上,大數據受到的重視程度都愈來愈高。

  1. 應用場景上,如今分佈在:
  2. 供應鏈和渠道分析&優化
  3. 訂價分析與優化
  4. 欺詐行爲分析&檢測
  5. 設備管理
  6. 社交媒體分析&客戶分析

《大數據時代》一書做者維克托認爲大數據時代有三大轉變:

「第一,咱們能夠分析更多的數據,有時候甚至能夠處理和某個特別現象相關的全部數據,而不是依賴於隨機採樣。更高的精確性可以使咱們發現更多的細節。
第二,研究數據如此之多,以致於咱們再也不熱衷於追求精確度。適當忽略微觀層面的精確度,將帶來更好的洞察力和更大的商業利益。
第三,再也不熱衷於尋找因果關係,而是事物之間的相關關係。例如,不去探究機票價格變更的緣由,可是關注買機票的最佳時機。」大數據打破了企業傳統數據的邊界,改變了過去商業智能僅僅依靠企業內部業務數據的局面,而大數據則使數據來源更加多樣化,不只包括企業內部數據,也包括企業外部數據,尤爲是和消費者相關的數據

據野史記載,中亞古國花剌子模有一古怪的風俗,凡是給君王帶來好消息的信使,就會獲得提高,給君王帶來壞消息的人則會被送去喂老虎。從前的人喜歡批評這位君王的天真品性,覺得獎勵帶來好消息的人,就能鼓勵好消息的到來,處死帶來壞消息的人,就能根絕壞消息。

在今天這個信息爆炸的時代,咱們不必定能讓信使必定送來好消息,但你可讓咱們的爬蟲定時給你送來最有用最合你需求的信息。造數 - 新一代智能雲爬蟲

相關文章
相關標籤/搜索