This document is a starting point for users working with Hadoop Distributed File System (HDFS) either as a part of a Hadoop cluster or as a stand-alone general purpose distributed file system. While HDFS is designed to 「just work」 in many environments, a working knowledge of HDFS helps greatly with configuration improvements and diagnostics on a specific cluster.html
目的java
本文檔對於使用HDFS的用戶來講是一個起點,不論是做爲Hadoop集羣的一部分仍是一個獨立的通用的分佈式文件系統。雖然HDFS被設計在不少環境下工做,可是HDFS工做原理的支持將極大的幫助配置的調高和特定集羣的故障檢測。node
HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. The HDFS Architecture Guide describes HDFS in detail. This user guide primarily deals with the interaction of users and administrators with HDFS clusters. The HDFS architecture diagram depicts basic interactions among NameNode, the DataNodes, and the clients. Clients contact NameNode for file metadata or file modifications and perform actual file I/O directly with the DataNodes.linux
The following are some of the salient features that could be of interest to many users.web
Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop.shell
HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters.apache
Hadoop is written in Java and is supported on all major platforms.api
Hadoop supports shell-like commands to interact with HDFS directly.安全
The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster.服務器
New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:
o File permissions and authentication.
o Rack awareness: to take a node’s physical location into account while scheduling tasks and allocating storage.
o Safemode: an administrative mode for maintenance.
o fsck: a utility to diagnose health of the file system, to find missing files or blocks.
o fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
o Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
o Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
o Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
o Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
o Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once.
概覽
HDFS是Hadoop應用程序使用的主要的分佈式存儲系統。一個HDFS集羣主要包括一個NameNode和多個DataNode,NameNode管理文件系統元數據,DataNode存儲真正的數據。HDFS的架構指南詳細地描述了HDFS。本用戶指南主要講述用戶和管理員與HDFS系統的交互。HDFS架構圖描繪了NameNode,DataNode和client之間的基本的交互。Client鏈接NameNode取得文件元數據或者文件修改信息,而後直接與DataNode執行真正的文件I/O操做。
下面是一些可能引發不少用戶興趣的特性:
1. Hadoop,包括HDFS,很是適合用標準硬件進行分佈式存儲和分佈式處理。它具備容錯,可伸縮和及其簡單的擴容等特性。因其對於大量的分佈式應用程序的簡單和高適用性而出名的MapReduce是Hadoop的一部分。
2. HDFS的默認配置適合大部分設備。大多數狀況下,配置只在很是大的集羣中時須要調優。
3. Hadoop用java編寫,支持全部主流的平臺。
4. Hadoop支持類shell的命令來與HDFS直接交互。
5. NameNode和DataNode內置了web服務器,使其更容易的檢查集羣當前的狀態。
6. 新的特性和改進一般實如今HDFS中,下面是HDFS中的部分有用的特性:
1> 文件權限和認證
2> 機架感知:在調度任務和申請存儲的時候考慮到節點的物理位置。
3> 安全模式:維護時的管理模式
4> fsck:一個檢測集羣文件系統健康情況的工具,能夠找出丟失的文件和Block。
5> fetchdt:一個獲取DelegationToken,而後存儲到本地文件系統的工具。
6> Rebalancer:當數據不平均的分佈在DataNode時平衡集羣數據的工具。
7> Upgrade和rollback:在軟件升級以後,再遇到不可預測的問題的狀況下,回滾回HDFS升級以前的狀態是能夠的。
8> Secondary NameNode:週期性地執行namespace的檢查點,幫助保持NameNode中存儲HDFS的修改日誌的文件的大小不超過某個範圍。
9> Checkpoint Node:週期性地執行namespace的檢查點操做,幫助減小存儲在NameNode的包含HDFS變化的日誌的大小。取代以前Secondary NameNode的角色,儘管還不是必須的。NameNode容許同時存在多個Checkpoint節點,只要系統中沒有BackUp節點的存在。
10> BackUp節點:Checkpoint節點的擴展。除了checkpoint,它還從NameNode接收一個edit文件流,在內存中維護他本身的namespace的copy,這個copy老是與NameNode節點上namespace的狀態同步。NameNode一次只能註冊一個BackUp節點。
The following documents describe how to install and set up a Hadoop cluster:
· Single Node Setup for first-time users.
· Cluster Setup for large, distributed clusters.
The rest of this document assumes the user is able to set up and run a HDFS with at least one DataNode. For the purpose of this document, both the NameNode and DataNode could be running on the same physical machine.
基本條件
下面的文檔描述瞭如何安裝和啓動一個Hadoop集羣:
Single Node Setup for first-time users.
Cluster Setup for large, distributed clusters.
本文檔剩下的部分假設用戶已可以創建和運行至少有一個DataNode節點的HDFS集羣。爲了實現本文檔的目的,NameNode和DataNode節點能夠運行在一臺物理機器上。
NameNode and DataNode each run an internal web server in order to display basic information about the current status of the cluster. With the default configuration, the NameNode front page is at http://namenode-name:50070/. It lists the DataNodes in the cluster and basic statistics of the cluster. The web interface can also be used to browse the file system (using 「Browse the file system」 link on the NameNode front page).
Web接口
每個NameNode和DataNode都運行一個內部的web服務器,以展現關於集羣當前的狀態的基本的信息。默認配置下,NameNode主頁面在http://namenode-name:50070/。它列出了集羣中全部的數據節點和集羣中基本統計信息。這個web接口能夠被用來瀏覽文件系統(在NameNode的主頁上有「Browse the file system」的連接)。
Hadoop includes various shell-like commands that directly interact with HDFS and other file systems that Hadoop supports. The command bin/hdfs dfs -help lists the commands supported by Hadoop shell. Furthermore, the command bin/hdfs dfs -help command-name displays more detailed help for a command. These commands support most of the normal files system operations like copying files, changing file permissions, etc. It also supports a few HDFS specific operations like changing replication of files. For more information see File System Shell Guide.
The bin/hdfs dfsadmin command supports a few HDFS administration related operations. The bin/hdfs dfsadmin -help command lists all the commands currently supported. For e.g.:
-report: reports basic statistics of HDFS. Some of this information is also available on the NameNode front page.
-safemode: though usually not required, an administrator can manually enter or leave Safemode.
-finalizeUpgrade: removes previous backup of the cluster made during last upgrade.
-refreshNodes: Updates the namenode with the set of datanodes allowed to connect to the namenode. Namenodes re-read datanode hostnames in the file defined by dfs.hosts, dfs.hosts.exclude Hosts defined in dfs.hosts are the datanodes that are part of the cluster. If there are entries in dfs.hosts, only the hosts in it are allowed to register with the namenode. Entries in dfs.hosts.exclude are datanodes that need to be decommissioned. Datanodes complete decommissioning when all the replicas from them are replicated to other datanodes. Decommissioned nodes are not automatically shutdown and are not chosen for writing for new replicas.
-printTopology : Print the topology of the cluster. Display a tree of racks and datanodes attached to the tracks as viewed by the NameNode.
For command usage, see dfsadmin.
Shell 命令
Hadoop擁有各類類shell命令,可以直接與HDFS和其餘Hadoop支持的文件系統進行交互。bin/hdfsdfs –help命令能夠列出Hadoop shell支持的命令。並且bin/hdfs dfs -help command-name會列出一個命令更多的細節。這些命令支持大多數標準文件系統操做,像複製文件,修改文件權限等。它也支持一些HDFS特有的操做,像改變文件副本個數。更多信息盡在 File System Shell Guide。
DFSAdmin Command
bin/hadoop dfsadmin命令支持一些HDFS管理相關的操做。bin/hadoopdfsadmin –help命令列出了當前支持的全部的命令,例如:
1. –report:報告HDFS基本的統計信息。這些信息中一些亦能夠在NameNode主頁中查看。
2. –safemode:最然一般不須要,可是一個管理員能夠手工的進入或者離開安全模式。
3. –finalizeUpgrade:移除最近的一次升級時先前集羣的備份。
4. –refreshNodes:更新namenode和多個鏈接到此NameNode的DataNode。NameNode從新讀取定義在dfs.hosts, dfs.hosts.exclude文件中的DataNode的hostname。定義在dfs.hosts文件中的主機是集羣中的datanode的部分。若是dfs.hosts中有條目,只有其中出現的主機才被容許註冊到NameNode。dfs.hosts.exclude中出現的條目是須要退役的DataNode。當這些節點上的副本在其餘數據節點副本完成,這些DataNode完成退役。退役的節點不會自動關機,新的副本不會在選擇這些節點寫入。
5. printTopology:打印機羣的拓撲。展現一個樹形的機架和依附於機架上的DataNode,就像在NameNode中看到的那樣。
更多命令的用法,看 dfsadmin。
The NameNode stores modifications to the file system as a log appended to a native file system file, edits. When a NameNode starts up, it reads HDFS state from an image file, fsimage, and then applies edits from the edits log file. It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file. Since NameNode merges fsimage and edits files only during start up, the edits log file could get very large over time on a busy cluster. Another side effect of a larger edits file is that next restart of NameNode takes longer.
The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode.
The start of the checkpoint process on the secondary NameNode is controlled by two configuration parameters.
dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and
dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
The secondary NameNode stores the latest checkpoint in a directory which is structured the same way as the primary NameNode’s directory. So that the check pointed image is always ready to be read by the primary NameNode if necessary.
For command usage, see secondarynamenode.
Secondary NameNode
NameNode存儲文件系統的變化,添加這些信息到本地文件系統中的日誌文件的末尾,這個日誌文件時edits。當一個NameNode啓動,它從一個鏡像文件,fsimage中讀取HDFS的狀態,而後應用edits日誌文件中的edits。而後將一個新的HDFS狀態寫到fsimage中,用一個新的空的edit文件存儲正常的操做。由於NameNode只在啓動時合併fsimage和edit文件,隨着時間推移,edit的日誌文件可能在一個忙碌的集羣中變得很是大。大edit日誌文件的另外一個反作用是下一次NameNode的啓動將會花費更長的時間。
Secondary NameNode週期性地合併fsimage和edit日誌文件,保持日誌文件的大小在一個範圍內。它一般運行在一個不一樣於NameNode的機器上,由於它的內存需求跟NameNode同樣。
Checkpoint進程在Secondary NameNode上的啓動被兩個配置參數管理:
1. dfs.namenode.checkpoint.period:默認設置爲1小時,這個參數指定兩次連續的Checkpoint操做的最大間隔。
2. dfs.namenode.checkpoint.txns:默認設置爲1百萬,這個參數定義了NameNode中沒有Checkpoint的事務的個數,若是超過這個個數,即便沒有到Checkpoint的時間,也會強制Checkpoint。
Secondary NameNode將最近的Checkpoint存儲到跟NameNode中同樣結構的目錄中。因此若是必要,被Checkpoint的image老是準備被NameNode讀取。
更多命令用法,看secondarynamenode。
NameNode persists its namespace using two files: fsimage, which is the latest checkpoint of the namespace and edits, a journal (log) of changes to the namespace since the checkpoint. When a NameNode starts up, it merges the fsimage and edits journal to provide an up-to-date view of the file system metadata. The NameNode then overwrites fsimage with the new HDFS state and begins a new edits journal.
The Checkpoint node periodically creates checkpoints of the namespace. It downloads fsimage and edits from the active NameNode, merges them locally, and uploads the new image back to the active NameNode. The Checkpoint node usually runs on a different machine than the NameNode since its memory requirements are on the same order as the NameNode. The Checkpoint node is started by bin/hdfs namenode -checkpoint on the node specified in the configuration file.
The location of the Checkpoint (or Backup) node and its accompanying web interface are configured viathe dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.
The start of the checkpoint process on the Checkpoint node is controlled by two configuration parameters.
dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints
dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheckpointed transactions on the NameNode which will force an urgent checkpoint, even if the checkpoint period has not been reached.
The Checkpoint node stores the latest checkpoint in a directory that is structured the same as the NameNode’s directory. This allows the checkpointed image to be always available for reading by the NameNode if necessary. See Import checkpoint.
Multiple checkpoint nodes may be specified in the cluster configuration file.
For command usage, see namenode.
Checkpoint Node
NameNode用兩個文件持久化它的namespace:fsimage和edits,fsimage是namespace的最近一次的Checkpoint,edits文件是自Checkpoint後namespace的變化日誌。當NameNode啓動時,它合併fsimage和edits日誌文件以提供一個文件系統元數據的最新的視圖。而後NameNode用新的HDFS狀態覆蓋fsimage,開起一個新的edits文件。
Checkpoint節點週期性的建立namespace的Checkpoint。它從active的NameNode下載fsimage和edits文件,本地合併它們生成新的image,而後將新的image上傳回activeNameNode上。Checkpoint節點一般運行在不一樣於NameNode的機器上,由於它的內存需求跟NameNode同樣。Checkpoint節點在配置文件中指定的節點上用bin/hdfs namenode –checkpoint啓動。
Checkpoint(或Backup)節點和它經過dfs.namenode.backup.address 和dfs.namenode.backup.http-address配置的附帶的web接口。
Checkpoint節點上的checkpoint進程被兩個配置參數管理:
1. dfs.namenode.checkpoint.period:默認設置爲1小時,這個參數指定兩次連續的Checkpoint操做的最大間隔。
2. dfs.namenode.checkpoint.txns:默認設置爲1百萬,這個參數定義了NameNode中沒有Checkpoint的事務的個數,若是超過這個個數,即便沒有到Checkpoint的時間,也會強制Checkpoint。
Checkpoint節點將最近的Checkpoint存儲到跟NameNode中同樣結構的目錄中。這使得必要時被checkpoint的image是可讀的。查看Import checkpoint。
一個集羣中能夠配置多個Checkpoint節點。
更多的命令用法,看 namenode。
The Backup node provides the same checkpointing functionality as the Checkpoint node, as well as maintaining an in-memory, up-to-date copy of the file system namespace that is always synchronized with the active NameNode state. Along with accepting a journal stream of file system edits from the NameNode and persisting this to disk, the Backup node also applies those edits into its own copy of the namespace in memory, thus creating a backup of the namespace.
The Backup node does not need to download fsimage and edits files from the active NameNode in order to create a checkpoint, as would be required with a Checkpoint node or Secondary NameNode, since it already has an up-to-date state of the namespace state in memory. The Backup node checkpoint process is more efficient as it only needs to save the namespace into the local fsimage file and reset edits.
As the Backup node maintains a copy of the namespace in memory, its RAM requirements are the same as the NameNode.
The NameNode supports one Backup node at a time. No Checkpoint nodes may be registered if a Backup node is in use. Using multiple Backup nodes concurrently will be supported in the future.
The Backup node is configured in the same manner as the Checkpoint node. It is started with bin/hdfs namenode -backup.
The location of the Backup (or Checkpoint) node and its accompanying web interface are configured via the dfs.namenode.backup.address and dfs.namenode.backup.http-address configuration variables.
Use of a Backup node provides the option of running the NameNode with no persistent storage, delegating all responsibility for persisting the state of the namespace to the Backup node. To do this, start the NameNode with the -importCheckpoint option, along with specifying no persistent storage directories of type edits dfs.namenode.edits.dir for the NameNode configuration.
For a complete discussion of the motivation behind the creation of the Backup node and Checkpoint node, see HADOOP-4539. For command usage, see namenode.
Backup節點
跟Checkpoint node差很少。
The latest checkpoint can be imported to the NameNode if all other copies of the image and the edits files are lost. In order to do that one should:
Create an empty directory specified in the dfs.namenode.name.dir configuration variable;
Specify the location of the checkpoint directory in the configuration variable dfs.namenode.checkpoint.dir;
and start the NameNode with -importCheckpoint option.
The NameNode will upload the checkpoint from the dfs.namenode.checkpoint.dir directory and then save it to the NameNode directory(s) set in dfs.namenode.name.dir. The NameNode will fail if a legal image is contained in dfs.namenode.name.dir. The NameNode verifies that the image in dfs.namenode.checkpoint.dir is consistent, but does not modify it in any way.
For command usage, see namenode.
Import Checkpoint
若是NameNode中全部其餘的image和edits文件的copy都丟失了,最近的Checkpoint能夠被import到NameNode中。爲了能夠import,你應該:
1. 在dfs.namenode.name.dir配置指定的path建立一個空的目錄。
2. 用dfs.namenode.checkpoint.dir指定Checkpoint目錄。
3. 用-importCheckpoint 選項啓動NameNode。
NameNode將從dfs.namenode.checkpoint.dir目錄中上傳Checkpoint,而後將它保存到 dfs.namenode.name.dir設置的NameNode的目錄。若是在dfs.namenode.name.dir目錄中有一個合法的image,NameNode將會失敗。NameNode檢驗dfs.namenode.checkpoint.dir 的image是否一致,可是任何狀況下都不會修改它。
更多命令用法,查看namenode。
HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
Policy to keep one of the replicas of a block on the same node as the node that is writing the block.
Need to spread different replicas of a block across the racks so that cluster can survive loss of whole rack.
One of the replicas is usually placed on the same rack as the node writing to the file so that cross-rack network I/O is reduced.
Spread HDFS data uniformly across the DataNodes in the cluster.
Due to multiple competing considerations, data might not be uniformly placed across the DataNodes. HDFS provides a tool for administrators that analyzes block placement and rebalanaces data across the DataNode. A brief administrator’s guide for balancer is available at HADOOP-1652.
For command usage, see balancer.
Rebalancer
HDFS數據可能不老是一致的被存放在DataNode中。一個常見的緣由是新DataNode節點的增長。當存放新的Block(一個文件的數據被存放爲一些列的Block)時,NameNode考慮不少的參數在選擇接收這些Block的DataNode時。下面是一些考慮的因素:
1. 保持一個Block的多個副本中的一個與正在寫入的Block在一個節點上。
2. 須要將副本跨機架傳播,這樣集羣能夠在整個機架淪陷時倖存。
3. 多個副本中的一個一般存放在跟正在寫入的文件相同的機架上,這樣能夠減小跨機架的網絡I/O。
4. 一致的的在集羣中的DataNode之間傳播HDFS數據。
考慮到多個相互矛盾的因素,數據可能不一致的存放在DataNode中。HDFS提供了一個分析數據塊的位置和從新平衡DataNode中的數據的工具。HADOOP-1652中是一個簡短的rebalancer的管理員指南,pdf格式。
更多命令用法,看balancer。
Typically large Hadoop clusters are arranged in racks and network traffic between different nodes with in the same rack is much more desirable than network traffic across the racks. In addition NameNode tries to place replicas of block on multiple racks for improved fault tolerance. Hadoop lets the cluster administrators decide which rack a node belongs to through configuration variable net.topology.script.file.name. When this script is configured, each node runs the script to determine its rack id. A default installation assumes all the nodes belong to the same rack. This feature and configuration is further described in PDF attached to HADOOP-692.
Rack Awareness
一般一個大的Hadoop集羣分佈在多個機架上,同一個機架上的不一樣節點間的網絡流量比跨機架的節點間的網絡流量更使人滿意。NameNode試圖將Block的副本放到多個機架上以提升容錯。經過配置net.topology.script.file.name,Hadoop讓集羣管理員本身決定一個節點屬於哪個機架。當此腳本配置,每個節點運行這個腳原本決定它屬於哪個機架。默認設置是假設全部的節點屬於同一個機架。此特性和配置更進一步的描述在 HADOOP-692。
During start up the NameNode loads the file system state from the fsimage and the edits log file. It then waits for DataNodes to report their blocks so that it does not prematurely start replicating the blocks though enough replicas already exist in the cluster. During this time NameNode stays in Safemode. Safemode for the NameNode is essentially a read-only mode for the HDFS cluster, where it does not allow any modifications to file system or blocks. Normally the NameNode leaves Safemode automatically after the DataNodes have reported that most file system blocks are available. If required, HDFS could be placed in Safemode explicitly using bin/hdfs dfsadmin -safemode command. NameNode front page shows whether Safemode is on or off. A more detailed description and configuration is maintained as JavaDoc for setSafeMode().
Safemode
NameNode啓動是從fsimage和edits日誌文件中加載文件系統狀態。而後等待DataNode報告它們的Block,因此NameNode不過早的複製Block,可能集羣中有足夠的副本。在這段時間內,NameNode在safemode狀態。NameNode的Safemode本質上來講就是HDFS集羣的只讀模式,它不容許文件系統或Block的任何修改。正常狀況下,在DataNode報告它的大多數文件系統的Block available以後,NameNode會自動的離開safemode模式。若是有必要,HDFS能夠用bin/hadoop dfsadmin –safemode明確的進入safemode。
NameNode主頁展現了safemode開關狀態。更詳細的描述和配置在setSafeMode()的java doc中。
HDFS supports the fsck command to check for various inconsistencies. It it is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks. Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures. By default fsck ignores open files but provides an option to select all files during reporting. The HDFS fsck command is not a Hadoop shell command. It can be run as bin/hdfs fsck. For command usage, see fsck. fsck can be run on the whole file system or on a subset of files.
Fsck
HDFS支持fsck命令來檢查各類不一致狀態。它被設置用來報告各類文件的各類問題,例如,丟失一個文件的某個Block或者正在複製的Block。不像針對本地文件系統的傳統的fsck工具,這個命令不更正它檢測到的錯誤。正常狀況下,NameNode自動更正大部分可恢復的失效。默認,fsck忽略打開的文件可是提供一個選項在報告時選擇全部的文件。HDFS fsck命令不是Hadoop shell命令。它能夠用bin/hadoop fsck運行。更多命令的用法,看fsck。Fsck能夠運行在整個文件系統或者全部文件的子集。
HDFS supports the fetchdt command to fetch Delegation Token and store it in a file on the local system. This token can be later used to access secure server (NameNode for example) from a non secure client. Utility uses either RPC or HTTPS (over Kerberos) to get the token, and thus requires kerberos tickets to be present before the run (run kinit to get the tickets). The HDFS fetchdt command is not a Hadoop shell command. It can be run as bin/hdfs fetchdt DTfile. After you got the token you can run an HDFS command without having Kerberos tickets, by pointing HADOOP_TOKEN_FILE_LOCATION environmental variable to the delegation token file. For command usage, see fetchdt command.
Fetchdt
HDFS支持fetchdt命令來獲取 Delegation Token和將其存儲到本地文件系統中。這個token以後能夠被用來從一個不安全的客戶端訪問安全的服務(例如NameNode)。此工具使用RPC或者HTTPS(在Kerberos之上)來獲取token,所以須要ticket才能運行(運行kinit 命令能夠獲得ticket)。HDFSfetchdt命令不是Hadoop shell命令。它能夠用bin/hadoop fetchdt DTfile運行。在你取得token以後,經過指定 HADOOP_TOKEN_FILE_LOCATION環境變量你能夠不須要Kerberosticket就運行HDFS命令, HADOOP_TOKEN_FILE_LOCATION指定delegationtoken文件的位置。更多命令用法,查看 fetchdt命令。
Typically, you will configure multiple metadata storage locations. Then, if one storage location is corrupt, you can read the metadata from one of the other storage locations.
However, what can you do if the only storage locations available are corrupt? In this case, there is a special NameNode startup mode called Recovery mode that may allow you to recover most of your data.
You can start the NameNode in recovery mode like so: namenode -recover
When in recovery mode, the NameNode will interactively prompt you at the command line about possible courses of action you can take to recover your data.
If you don’t want to be prompted, you can give the -force option. This option will force recovery mode to always select the first choice. Normally, this will be the most reasonable choice.
Because Recovery mode can cause you to lose data, you should always back up your edit log and fsimage before using it.
Recovery Mode
一般狀況下,你須要配置多個元數據的存儲位置。而後,若果一個存儲位置崩潰,你能夠從另外一個其餘的位置讀取元數據。
可是,若是僅有的存儲崩潰,你能作啥呢?在這種狀況下,有一個特殊的NameNode啓動模式,Recovery Mode,它容許你恢復你的大多數數據。
你能夠用namenode –recover以recovery mode啓動NameNode。
在Recovery Mode時,NameNode將在命令行交互性地提示你能夠恢復數據的可能的行動步驟。
若是你不但願被提示,你能夠給 -force選項。這個選項將強制RecoveryMode老是選擇第一個選項。正常狀況下,這將是最合理的選擇。
由於Recovery Mode可能使你丟失信息,在使用它以前,你應該老是備份你的edit日誌和fsimage。
When Hadoop is upgraded on an existing cluster, as with any software upgrade, it is possible there are new bugs or incompatible changes that affect existing applications and were not discovered earlier. In any non-trivial HDFS installation, it is not an option to loose any data, let alone to restart HDFS from scratch. HDFS allows administrators to go back to earlier version of Hadoop and rollback the cluster to the state it was in before the upgrade. HDFS upgrade is described in more detail in Hadoop Upgrade Wiki page. HDFS can have one such backup at a time. Before upgrading, administrators need to remove existing backup using bin/hadoop dfsadmin -finalizeUpgrade command. The following briefly describes the typical upgrade procedure:
Before upgrading Hadoop software, finalize if there an existing backup. dfsadmin -upgradeProgress status can tell if the cluster needs to be finalized.
Stop the cluster and distribute new version of Hadoop.
Run the new version with -upgrade option (bin/start-dfs.sh -upgrade).
Most of the time, cluster works just fine. Once the new HDFS is considered working well (may be after a few days of operation), finalize the upgrade. Note that until the cluster is finalized, deleting the files that existed before the upgrade does not free up real disk space on the DataNodes.
If there is a need to move back to the old version,
o stop the cluster and distribute earlier version of Hadoop.
o run the rollback command on the namenode (bin/hdfs namenode -rollback).
o start the cluster with rollback option. (sbin/start-dfs.sh -rollback).
When upgrading to a new version of HDFS, it is necessary to rename or delete any paths that are reserved in the new version of HDFS. If the NameNode encounters a reserved path during upgrade, it will print an error like the following:
/.reserved is a reserved path and .snapshot is a reserved path component in this version of HDFS. Please rollback and delete or rename this path, or upgrade with the -renameReserved [key-value pairs] option to automatically rename these paths during upgrade.
Specifying -upgrade -renameReserved [optional key-value pairs] causes the NameNode to automatically rename any reserved paths found during startup. For example, to rename all paths named .snapshot to .my-snapshot and .reserved to .my-reserved, a user would specify -upgrade -renameReserved .snapshot=.my-snapshot,.reserved=.my-reserved.
If no key-value pairs are specified with -renameReserved, the NameNode will then suffix reserved paths with .<LAYOUT-VERSION>.UPGRADE_RENAMED, e.g. .snapshot.-51.UPGRADE_RENAMED.
There are some caveats to this renaming process. It’s recommended, if possible, to first hdfs dfsadmin -saveNamespace before upgrading. This is because data inconsistency can result if an edit log operation refers to the destination of an automatically renamed file.
Upgrade 和 Rollback
當Hadoop在一個已存在的集羣上被升級的時候,就像任何的軟件升級同樣,它可能有一些新的bug或者不兼容的變化,這些bug和變化可能會影響已存在的應用程序,而且不能過早的發現。在任何重要的HDFS安裝中,丟失任何數據都是不容許的,更不用說HDFS從新啓動。HDFS容許管理員回滾回Hadoop升級以前的版本,將集羣回滾回升級以前的狀態。HDFS升級更多的細節在Hadoop Upgrade。這個時候,HDFS能夠有這樣一個備份。在升級以前,管理員須要 用bin/hadoop dfsadmin -finalizeUpgrade命令移除已經存在的備份。下面是對一個典型的升級過程簡短的描述:
1. 在升級Hadoop軟件以前,若是有一個已經存在的備份,finalize掉。dfsadmin -upgradeProgress狀態能夠告訴咱們集羣是否須要被finalize。
2. 中止集羣,分發新版本的Hadoop。
3. 用bin/start-dfs.sh-upgrade運行新版本的hadoop
4. 大多數狀況下,集羣會很好的工做。一旦新HDFS被認爲工做良好(多是不少天的操做以後得出),finalize掉這個Upgrade。注意,直到集羣被finalize,刪除升級以前存在的文件不會釋放DataNode上真正的存儲空間。
5. 若是有須要回滾回舊版本,
a) 停掉集羣,分發hadoop舊版本。
b) 用rollback選項啓動集羣,bin/start-dfs.sh –rollback。
當升級到一個新版本的HDFS,有必要更改或刪除任何存儲在新版本中的路徑。若是升級期間,NameNode遇到一個存在的路徑,它將會打印像下面這樣的錯誤:
/.reserved is a reserved path and .snapshot is areserved path component in this version of HDFS. Please rollback and delete orrename this path, or upgrade with the -renameReserved [key-value pairs] optionto automatically rename these paths during upgrade.
指定 -upgrade -renameReserved[optional key-value pairs]會使NameNode自動更改啓動過程當中發現任何保存的路徑。例如,更改全部的.snapshot命名的路徑爲.my-snapshot,更改全部的.reserved路徑爲.my-reserved,用戶也能夠指定-upgrade -renameReserved.snapshot=.my-snapshot,.reserved=.my-reserved。
若是沒有key-value對用-renameReserved被指定,NameNode將添加後綴 .<LAYOUT-VERSION>.UPGRADE_RENAMED,例如, .snapshot.-51.UPGRADE_RENAMED。
Rename進程會有一些警告。建議,若是可能,升級以前先運行hdfs dfsadmin -saveNamespace。這是由於若是edit日誌操做涉及到自動修改過的文件的話,數據會出現不一致的狀況。
Datanode supports hot swappable drives. The user can add or replace HDFS data volumes without shutting down the DataNode. The following briefly describes the typical hot swapping drive procedure:
If there are new storage directories, the user should format them and mount them appropriately.
The user updates the DataNode configuration dfs.datanode.data.dir to reflect the data volume directories that will be actively in use.
The user runs dfsadmin -reconfig datanode HOST:PORT start to start the reconfiguration process. The user can use dfsadmin -reconfig datanode HOST:PORT status to query the running status of the reconfiguration task.
Once the reconfiguration task has completed, the user can safely umount the removed data volume directories and physically remove the disks.
The file permissions are designed to be similar to file permissions on other familiar platforms like Linux. Currently, security is limited to simple file permissions. The user that starts NameNode is treated as the superuser for HDFS. Future versions of HDFS will support network authentication protocols like Kerberos for user authentication and encryption of data transfers. The details are discussed in the Permissions Guide.
文件權限和安全
文件權限的設計跟其餘常見的平臺像linux是類似的。目前,安全僅限於簡單的文件權限。啓動NameNode的用戶被認爲是HDFS的超級用戶。未來的HDFS版本將支持網絡認證協議像Kerberos來支持用戶認證和數據傳輸加密。更詳細的討論在權限指南。
Hadoop currently runs on clusters with thousands of nodes. The PoweredBy Wiki page lists some of the organizations that deploy Hadoop on large clusters. HDFS has one NameNode for each cluster. Currently the total memory available on NameNode is the primary scalability limitation. On very large clusters, increasing average size of files stored in HDFS helps with increasing cluster size without increasing memory requirements on NameNode. The default configuration may not suite very large clusters. The FAQ Wiki page lists suggested configuration improvements for large Hadoop clusters.
Scalability
Hadoop目前能夠運行在幾千個節點的集羣上。PoweredByWiki頁面上列出了一些部署hadoop大規模集羣的組織。HDFS在每一個集羣中有一個NameNode。目前NameNode上總的內存是主要的擴展限制。在每個大集羣上,增長存儲在HDFS中的文件的大小有助於在不增長NameNode內存的狀況下增長集羣存儲能力。默認的配置可能不適合很是大的集羣。FAQ Wiki頁面列出了對於大規模hadoop集羣的建議的配置提升。
This user guide is a good starting point for working with HDFS. While the user guide continues to improve, there is a large wealth of documentation about Hadoop and HDFS. The following list is a starting point for further exploration:
· Hadoop Site: The home page for the Apache Hadoop site.
· Hadoop Wiki: The home page (FrontPage) for the Hadoop Wiki. Unlike the released documentation, which is part of Hadoop source tree, Hadoop Wiki is regularly edited by Hadoop Community.
· FAQ: The FAQ Wiki page.
· Hadoop User Mailing List: user[at]hadoop.apache.org.
· Explore聽hdfs-default.xml. It includes brief description of most of the configuration variables available.
· HDFS Commands Guide: HDFS commands usage.
相關的文檔
本用戶指南對於用HDFS工做來講是一個好的起點。當用戶指南繼續改進,將會有一個很大的關於hadoop和HDFS的文檔。下面列出了對於更進一步的探索的起點:
Hadoop Site: The home page for the Apache Hadoop site.
Hadoop Wiki: The home page (FrontPage) for the Hadoop Wiki. Unlike the released documentation, which is part of Hadoop source tree, Hadoop Wiki is regularly edited by Hadoop Community.
FAQ: The FAQ Wiki page.
Hadoop User Mailing List: user[at]hadoop.apache.org.
Explore hdfs-default.xml. It includes brief description of most of the configuration variables available.
Hadoop Commands Guide: Hadoop commands usage.