HDFS集羣主要由管理文件系統元數據的NameNode和存儲實際數據的DataNode組成.node
HDFS架構描述了NameNode,DataNodes與客戶端的基本交互.
客戶端與NameNode聯繫以進行文件元數據或文件修改,並直接與DataNode執行實際的文件I / O。web
Hadoop一些顯著的特性:
1)Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop.shell
2)HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters.服務器
3)Hadoop is written in Java and is supported on all major platforms.架構
4)Hadoop supports shell-like commands to interact with HDFS directly.app
5)The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster.oop
6)New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:fetch
7)File permissions and authentication.
8)Rack awareness: to take a node’s physical location into account while scheduling tasks and allocating storage.
9)Safemode: an administrative mode for maintenance.
10)fsck: a utility to diagnose health of the file system, to find missing files or blocks.
11)fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
12)Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
13)Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
14)Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
15)Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
16)Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once. ui
Web界面
每一個NameNode和DataNode都運行了一個內部web服務器.
默認配置下,NameNode首頁爲:http://namenode-name:50070/
也能夠瀏覽HDFS文件系統(使用"Browse the file system")spa
Shell命令:
bin/hdfs dfs -help #Hadoop shell所支持的命令列表
bin/hdfs dfs -help command-name #顯示某個命令的詳細幫助信息
dfsadmin命令
bin/hdfs dfsadmin -help
hdfs dfsadmin -printTopology # 輸出集羣的拓撲
Although the Hadoop framework is implemented in Java™, MapReduce applications need not be written in Java.
Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer.
Hadoop Pipes is a SWIG-compatible C++ API to implement MapReduce applications (non JNI™ based).