前段時間由於hadoop集羣各datanode空間使用率很不均衡,須要從新balance(主要是有後加入集羣的2臺機器磁盤空間比較大引發的),在執行以下語句:node
bin/start-balancer.sh -threshold 10
後,日誌輸出以下:mysql
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Mar 10, 2014 11:03:40 AM 0 0 KB 614.5 GB 20 GB Mar 10, 2014 11:03:41 AM 1 0 KB 614.5 GB 20 GB Mar 10, 2014 11:03:42 AM 2 443 KB 614.5 GB 20 GB Mar 10, 2014 11:03:43 AM 3 443 KB 614.5 GB 20 GB Mar 10, 2014 11:03:44 AM 4 891.85 KB 614.5 GB 20 GB Mar 10, 2014 11:03:45 AM 5 891.85 KB 614.5 GB 20 GB Mar 10, 2014 11:03:46 AM 6 891.85 KB 614.5 GB 20 GB Mar 10, 2014 11:03:47 AM 7 891.85 KB 614.49 GB 20 GB Mar 10, 2014 11:03:48 AM 8 891.85 KB 614.49 GB 20 GB No block has been moved for 5 iterations. Exiting... Balancing took 10.023 seconds
很明顯,balancer已經計算出要移動的數據量,可是就是沒有移動,這是爲何呢?sql
查看hadoop-mysql-balancer-master.log並無發現Error或者Warning,那隻能去看源碼了。ide
原來hadoop balancer在進行轉移block的時候是會判斷的,具體要求見下面的代碼:oop
/* Decide if it is OK to move the given block from source to target * A block is a good candidate if * 1. the block is not in the process of being moved/has not been moved; * 2. the block does not have a replica on the target; * 3. doing the move does not reduce the number of racks that the block has */ private boolean isGoodBlockCandidate(Source source, BalancerDatanode target, BalancerBlock block) { // check if the block is moved or not if (movedBlocks.contains(block)) { return false; } if (block.isLocatedOnDatanode(target)) { return false; } boolean goodBlock = false; if (cluster.isOnSameRack(source.getDatanode(), target.getDatanode())) { // good if source and target are on the same rack goodBlock = true; } else { boolean notOnSameRack = true; synchronized (block) { for (BalancerDatanode loc : block.locations) { if (cluster.isOnSameRack(loc.datanode, target.datanode)) { notOnSameRack = false; break; } } } if (notOnSameRack) { // good if target is target is not on the same rack as any replica goodBlock = true; } else { // good if source is on the same rack as on of the replicas for (BalancerDatanode loc : block.locations) { if (loc != source && cluster.isOnSameRack(loc.datanode, source.datanode)) { goodBlock = true; break; } } } } return goodBlock; }
對照上面的3個要求,逐一排查未移動block的緣由:spa
(1)須要移動的block在本次balance的過程當中沒有被移動過------這條知足;日誌
(2)須要移動的block在目標機器上不存在------這條待驗證;code
(3)須要移動的block,在移動後不改變每一個機架上block的數量(注意,這是的數量不是總數量,是去重之後的block數量,例如,block的備份數是2,實際上是算一個惟一的block)------因爲集羣在配置的時候沒有添加機架感知腳本,因此默認狀況下,都在1個機架上,這條知足。xml
那如今就去集羣上驗證第二條,果不其然,發現不少block在後面加入的2臺機器上都已經存在,這還移動個屁啊,那邊都已經存在了,因此balancer移動進程就退出了。blog
解決方法:
1.使用以下命令
bin/hadoop fs -setRep -R / 2
將集羣中的block備份數同一設置成你在hdfs-site.xml中
<property> <name>dfs.replication</name> <value>2</value> </property>
配置的備份數,而後重啓hadoop集羣,等hadoop完成校驗blcok之後再balance便可解決問題。