Working on hadoop, especially on test clusters, I have managed to break my HDFS layer and sometimes with no possible redemption, or at least none that I wanted to invest time in. For whatever other reason sometimes you just want to scratch your HDFS and start anew.node
Without going on too much details, which is outside the point of this blog post. HDFS is mainly composed of 2 types of elements:bootstrap
The Namenode: /hadoop/hdfs/namenode/currentide
All new edits are written to the the edit log and regularly merged out to an FSImage file, for more concise management. An fsimage file represents the file system state after all modifications up to a specific transaction ID. The seen_txid file, has the last seen transaction. VERSION: contains cluster and hdfs IDs.wordpress
For a more detailled explanation: Hdfs metadataoop
The Datanode: /hadoop/hdfs/data/currentpost
In our example we will only focus on VERSIOn very close to the namenode VERSION.ui
Hdfs non HA formattingthis
In non HA everything is simple enough.spa
At this point your HDFS layer is empty and if you check the VERSION of namenodes and datanodes they should coinciderest
Hdfs HA formatting
In HA things get a little more complicated. In HA Standby and Active namenodes have a shared storage managed by the journal node service. HA relies on a failover scenario to swap from StandBy to Active Namenode and as any other system in hadoop this uses zookeeper. As you can see a couple more pieces need to made aware of a formatting action.
The initial steps are very close
This was a very simple step by step guide to formatting. In a later article we will cover actually repairing common errors in HDFS