1、基礎apache
1.瞭解Java、Linux操做系統相關知識服務器
2.如需精進,應爲水平要達到必定標準,可以閱讀國外相關技術網站,eg:http://hadoop.apache.org/oracle
2、什麼是Hadoopapp
照搬官網並略做翻譯:框架
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.分佈式
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.ide
The project includes these modules:工具
Hadoop系統是一個穩定、分佈式計算相關開源軟件系統。oop
Hadoop框架經過使用集羣中的簡單程序模型支持大數據的分佈式計算,它能夠從單臺計算機拓展到上千臺提供本地計算和存儲功能的服務器組成的集羣。與以往系統使用硬件保證高可用方式不一樣,Hadoop在應用層能夠檢測、處理異常,所以經過集羣頂層的服務保證高可用性。大數據
Hadoop主要分爲如下模塊:
(1)Hadoop Common:支持其餘模塊的公共工具
(2)HDFS:分佈式文件系統,用於提供系統數據存儲服務(至關於oracle的存儲模塊)
(3)Hadoop YARN:工做與資源調度模塊,至關於基於HDFS的操做系統
(4)Hadoop MapReduce:基於YARN系統的分佈式計算方法
3、系統劃分