hadoop異常記錄

時間 2019-11-08

標籤 hadoop 異常記錄欄目 Hadoop 简体版

原文原文鏈接

下面遇到問題，提供了一些解決辦法，但願有所幫助

1：Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
這是reduce預處理階段shuffle時獲取已完成的map的輸出失敗次數超過上限形成的，上限默認爲5。引發此問題的方式可能會有不少種，好比網絡鏈接不正常，鏈接超時，帶寬較差以及端口阻塞等，一般框架內網絡狀況較好是不會出現此錯誤的。

2：Too many fetch-failures
Answer:
出現這個問題主要是結點間的連通不夠全面。
1) 檢查、/etc/hosts
要求本機ip 對應服務器名
要求要包含全部的服務器ip + 服務器名
2) 檢查 .ssh/authorized_keys
要求包含全部服務器（包括其自身）的public key

3：處理速度特別的慢出現map很快可是reduce很慢並且反覆出現 reduce=0%
Answer:
結合第二點，而後
修改 conf/hadoop-env.sh 中的export HADOOP_HEAPSIZE=4000

4：可以啓動datanode，但沒法訪問，也沒法結束的錯誤
在從新格式化一個新的分佈式文件時，須要將你NameNode上所配置的dfs.name.dir這一namenode用來存放NameNode 持久存儲名字空間及事務日誌的本地文件系統路徑刪除，同時將各DataNode上的dfs.data.dir的路徑 DataNode 存放塊數據的本地文件系統路徑的目錄也刪除。如本此配置就是在NameNode上刪除/home/hadoop/NameData，在DataNode上刪除/home/hadoop/DataNode1和/home/hadoop/DataNode2。這是由於Hadoop在格式化一個新的分佈式文件系統時，每一個存儲的名字空間都對應了創建時間的那個版本（能夠查看/home/hadoop /NameData/current目錄下的VERSION文件，上面記錄了版本信息），在從新格式化新的分佈式系統文件時，最好先刪除NameData 目錄。必須刪除各DataNode的dfs.data.dir。這樣纔可使namedode和datanode記錄的信息版本對應。
注意：刪除是個很危險的動做，不能確認的狀況下不能刪除！！作好刪除的文件等統統備份！！

5：java.io.IOException: Could not obtain block: blk_194219614024901469_1100 file=/user/hive/warehouse/src_20090724_log/src_20090724_log
出現這種狀況大可能是結點斷了，沒有鏈接上。

6：java.lang.OutOfMemoryError: Java heap space
出現這種異常，明顯是jvm內存不夠得緣由，要修改全部的datanode的jvm內存大小。
Java -Xms1024m -Xmx4096m
通常jvm的最大內存使用應該爲總內存大小的一半，咱們使用的8G內存，因此設置爲4096m，這一值可能依舊不是最優的值。

Hadoop添加節點的方法
本身實際添加節點過程：
1. 先在slave上配置好環境，包括ssh，jdk，相關config，lib，bin等的拷貝；
2. 將新的datanode的host加到集羣namenode及其餘datanode中去；
3. 將新的datanode的ip加到master的conf/slaves中；
4. 重啓cluster,在cluster中看到新的datanode節點；
5. 運行bin/start-balancer.sh，這個會很耗時間
備註：
1. 若是不balance，那麼cluster會把新的數據都存放在新的node上，這樣會下降mr的工做效率；
2. 也可調用bin/start-balancer.sh 命令執行，也可加參數 -threshold 5
threshold 是平衡閾值，默認是10%，值越低各節點越平衡，但消耗時間也更長。
3. balancer也能夠在有mr job的cluster上運行，默認dfs.balance.bandwidthPerSec很低，爲1M/s。在沒有mr job時，能夠提升該設置加快負載均衡時間。

其餘備註：
1. 必須確保slave的firewall已關閉;
2. 確保新的slave的ip已經添加到master及其餘slaves的/etc/hosts中，反之也要將master及其餘slave的ip添加到新的slave的/etc/hosts中

mapper及reducer個數
url地址： http://wiki.apache.org/hadoop/HowManyMapsAndReduces

[SQL] 純文本查看 複製代碼

          HowManyMapsAndReduces 
        
          Partitioning your jobintomapsandreduces 
        
          Picking the appropriatesizeforthe tasksforyour job can radically change the performanceofHadoop. Increasing the numberoftasks increases the framework overhead, but increasesloadbalancingandlowers the costoffailures.Atone extremeisthe 1 map/1 reducecasewherenothingisdistributed. The other extremeistohave 1,000,000 maps/ 1,000,000 reduceswherethe framework runsoutofresourcesforthe overhead. 
        
          NumberofMaps 
        
          The numberofmapsisusually drivenbythe numberofDFS blocksinthe input files. Although that causes peopletoadjust their DFS blocksizetoadjust the numberofmaps. Therightlevelofparallelismformaps seemstobe around 10-100 maps/node, although we have taken it upto300orsoforvery cpu-light map tasks. Task setup takes awhile, so itisbest if the maps takeatleast aminutetoexecute. 
        
          Actually controlling the numberofmapsissubtle. The mapred.map.tasks parameterisjust a hinttothe InputFormatforthe numberofmaps. ThedefaultInputFormat behavioristosplit the total numberofbytesintotherightnumberoffragments. However,inthedefaultcasethe DFS blocksizeofthe input filesistreatedasanupperboundforinput splits. Alowerboundonthe splitsizecan besetvia mapred.min.split.size. Thus, if you expect 10TBofinput dataandhave 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the [WWW] InputFormat determines the number of maps. 
        
          The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(intnum). This can be usedtoincrease the numberofmap tasks, but willnotsetthe number below that which Hadoop determines via splitting the input data. 
        
          NumberofReduces 
        
          Therightnumberofreduces seemstobe 0.95or1.75 * (nodes * mapred.tasktracker.tasks.maximum).At0.95allofthe reduces can launch immediatelyandstart transfering map outputsasthe maps finish.At1.75 the faster nodes will finish theirfirstroundofreducesandlaunch asecondroundofreduces doing a much better jobofloadbalancing. 
        
          Currently the numberofreducesislimitedtoroughly 1000bythe buffersizefortheoutputfiles (io.buffer.size* 2 * numReduces << heapSize). This will be fixedatsomepoint, but until itisit provides a pretty firmupperbound. 
        
          The numberofreduces also controls the numberofoutputfilesintheoutputdirectory, but usually thatisnotimportant because thenextmap/reduce step will split themintoeven smaller splitsforthe maps. 
        
          The numberofreduce tasks can also be increasedinthe same wayasthe map tasks, via JobConf's conf.setNumReduceTasks(intnum).

本身的理解：
mapper個數的設置：跟input file 有關係，也跟filesplits有關係，filesplits的上線爲dfs.block.size，下線能夠經過mapred.min.split.size設置，最後仍是由InputFormat決定。

較好的建議：
The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

[XML] 純文本查看 複製代碼

          <property> 
        
            <name>mapred.tasktracker.reduce.tasks.maximum</name> 
        
            <value>2</value> 
        
            <description>The maximum number of reduce tasks that will be run 
        
            simultaneously by a task tracker. 
        
            </description> 
        
          </property>

單個node新加硬盤
1.修改須要新加硬盤的node的dfs.data.dir，用逗號分隔新、舊文件目錄
2.重啓dfs

同步hadoop 代碼
hadoop-env.sh

[Bash shell] 純文本查看 複製代碼

1 2	# host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop

用命令合併HDFS小文件
hadoop fs -getmerge <src> <dest>

重啓reduce job方法

[Bash shell] 純文本查看 複製代碼

1 2	Introduced recovery of jobs when JobTracker restarts. This facility is off by default. Introduced config parameters"mapred.jobtracker.restart.recover","mapred.jobtracker.job.history.block.size", and"mapred.jobtracker.job.history.buffer.size".

還未驗證過。

IO寫操做出現問題

[Bash shell] 純文本查看 複製代碼

          0-1246359584298, infoPort=50075, ipcPort=50020):Got exceptionwhileserving blk_-5911099437886836280_1292 to/172.16.100.165: 
        
          java.net.SocketTimeoutException: 480000 millis timeoutwhilewaitingforchannel to be readyforwrite. ch : java.nio.channels.SocketChannel[connectedlocal=/ 
        
          172.16.100.165:50010 remote=/172.16.100.165:50930] 
        
                  at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185) 
        
                  at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) 
        
                  at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) 
        
                  at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293) 
        
                  at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387) 
        
                  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179) 
        
                  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94) 
        
                  at java.lang.Thread.run(Thread.java:619)

It seems there are many reasons that it can timeout, the example given in
HADOOP-3831 is a slow reading client.

解決辦法：在hadoop-site.xml中設置dfs.datanode.socket.write.timeout=0試試；

HDFS退服節點的方法
目前版本的dfsadmin的幫助信息是沒寫清楚的，已經file了一個bug了，正確的方法以下：
1. 將 dfs.hosts 置爲當前的 slaves，文件名用完整路徑，注意，列表中的節點主機名要用大名，即 uname -n 能夠獲得的那個。
2. 將 slaves 中要被退服的節點的全名列表放在另外一個文件裏，如 slaves.ex，使用 dfs.host.exclude 參數指向這個文件的完整路徑
3. 運行命令 bin/hadoop dfsadmin -refreshNodes
4. web界面或 bin/hadoop dfsadmin -report 能夠看到退服節點的狀態是 Decomission in progress，直到須要複製的數據複製完成爲止
5. 完成以後，從 slaves 裏（指 dfs.hosts 指向的文件）去掉已經退服的節點

附帶說一下 -refreshNodes 命令的另外三種用途：
2. 添加容許的節點到列表中（添加主機名到 dfs.hosts 裏來）
3. 直接去掉節點，不作數據副本備份（在 dfs.hosts 裏去掉主機名）
4. 退服的逆操做——中止 exclude 裏面和 dfs.hosts 裏面都有的，正在進行 decomission 的節點的退服，也就是把 Decomission in progress 的節點從新變爲 Normal （在 web 界面叫 in service)

######################################
hadoop 學習借鑑
解決hadoop OutOfMemoryError問題：

[XML] 純文本查看 複製代碼

          <property> 
        
             <name>mapred.child.java.opts</name> 
        
             <value>-Xmx800M -server</value> 
        
          </property>

With the right JVM size in your hadoop-site.xml , you will have to copy this
to all mapred nodes and restart the cluster.
或者：hadoop jar jarfile [main class] -D mapred.child.java.opts=-Xmx800M

Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) while indexing.

[Bash shell] 純文本查看 複製代碼

1 2	when i use nutch1.0,get this error: Hadoop java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)whileindexing.

這個也很好解決：
能夠刪除conf/log4j.properties，而後能夠看到詳細的錯誤報告
我這兒出現的是out of memory
解決辦法是在給運行主類org.apache.nutch.crawl.Crawl加上參數：-Xms64m -Xmx512m
你的或許不是這個問題，可是能看到詳細的錯誤報告問題就好解決了

distribute cache使用
相似一個全局變量，可是因爲這個變量較大，因此不能設置在config文件中，轉而使用distribute cache
具體使用方法：(詳見《the definitive guide》,P240)
1. 在命令行調用時：調用-files，引入須要查詢的文件(能夠是local file, HDFS file(使用hdfs://xxx?)), 或者 -archives (JAR,ZIP, tar等)

[Bash shell] 純文本查看 複製代碼

1 2	% hadoop jar job.jar MaxTemperatureByStationNameUsingDistributedCacheFile / -files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/alloutput

2. 程序中調用：

[Java] 純文本查看 複製代碼

          publicvoidconfigure(JobConf conf) { 
        
             metadata =newNcdcStationMetadata(); 
        
             try{ 
        
               metadata.initialize(newFile("stations-fixed-width.txt")); 
        
             }catch(IOException e) { 
        
               thrownewRuntimeException(e); 
        
             } 
        
          }

另一種間接的使用方法：在hadoop-0.19.0中好像沒有
調用addCacheFile()或者addCacheArchive()添加文件，
使用getLocalCacheFiles() 或 getLocalCacheArchives() 得到文件

hadoop的job顯示web

[Bash shell] 純文本查看 複製代碼

1	There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS master)whichdisplay status pages about the state of the entire system. By default, these are located at [WWW] [url]http://job.tracker.addr:50030/[/url] and [WWW] [url]http://name.node.addr:50070/.[/url]

hadoop監控
OnlyXP(52388483) 131702
用nagios做告警，ganglia做監控圖表便可

status of 255 error
錯誤類型：

[Bash shell] 純文本查看 複製代碼

1 2	java.io.IOException: Task processexitwith nonzero status of 255. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

錯誤緣由：

[Bash shell] 純文本查看 複製代碼

1	Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hours to higher value. By default, their values are 24 hours. These might be the reasonforfailure, though I'm not sure

split size

[Bash shell] 純文本查看 複製代碼

          FileInputFormat input splits: (詳見 《the definitive guide》P190) 
        
          mapred.min.split.size: default=1, the smallest valide sizeinbytesforafilesplit. 
        
          mapred.max.split.size: default=Long.MAX_VALUE, the largest valid size.

dfs.block.size: default = 64M, 系統中設置爲128M。
若是設置 minimum split size > block size, 會增長塊的數量。(猜測從其餘節點拿去數據的時候，會合並block，致使block數量增多)
若是設置maximum split size < block size, 會進一步拆分block。

split size = max(minimumSize, min(maximumSize, blockSize));
其中 minimumSize < blockSize < maximumSize.

sort by value
hadoop 不提供直接的sort by value方法，由於這樣會下降mapreduce性能。
但能夠用組合的辦法來實現，具體實現方法見《the definitive guide》, P250
基本思想：
1. 組合key/value做爲新的key；
2. 重載partitioner，根據old key來分割；
conf.setPartitionerClass(FirstPartitioner.class);
3. 自定義keyComparator：先根據old key排序，再根據old value排序；
conf.setOutputKeyComparatorClass(KeyComparator.class);
4. 重載GroupComparator, 也根據old key 來組合； conf.setOutputValueGroupingComparator(GroupComparator.class);

small input files的處理
對於一系列的small files做爲input file，會下降hadoop效率。
有3種方法能夠將small file合併處理：
1. 將一系列的small files合併成一個sequneceFile，加快mapreduce速度。
詳見WholeFileInputFormat及SmallFilesToSequenceFileConverter,《the definitive guide》, P194
2. 使用CombineFileInputFormat集成FileinputFormat，可是未實現過；
3. 使用hadoop archives(相似打包)，減小小文件在namenode中的metadata內存消耗。(這個方法不必定可行，因此不建議使用)
方法：
將/my/files目錄及其子目錄歸檔成files.har，而後放在/my目錄下
bin/hadoop archive -archiveName files.har /my/files /my

查看files in the archive:
bin/hadoop fs -lsr har://my/files.har

skip bad records

[Java] 純文本查看 複製代碼

          JobConf conf =newJobConf(ProductMR.class); 
        
          conf.setJobName("ProductMR"); 
        
          conf.setOutputKeyClass(Text.class); 
        
          conf.setOutputValueClass(Product.class); 
        
          conf.setMapperClass(Map.class); 
        
          conf.setReducerClass(Reduce.class); 
        
          conf.setMapOutputCompressorClass(DefaultCodec.class); 
        
          conf.setInputFormat(SequenceFileInputFormat.class); 
        
          conf.setOutputFormat(SequenceFileOutputFormat.class); 
        
          String objpath ="abc1"; 
        
          SequenceFileInputFormat.addInputPath(conf,newPath(objpath)); 
        
          SkipBadRecords.setMapperMaxSkipRecords(conf, Long.MAX_VALUE); 
        
          SkipBadRecords.setAttemptsToStartSkipping(conf,0); 
        
          SkipBadRecords.setSkipOutputPath(conf,newPath("data/product/skip/")); 
        
          String output ="abc"; 
        
          SequenceFileOutputFormat.setOutputPath(conf,newPath(output)); 
        
          JobClient.runJob(conf);

For skipping failed tasks try : mapred.max.map.failures.percent

restart 單個datanode
若是一個datanode 出現問題，解決以後須要從新加入cluster而不重啓cluster，方法以下：

[Bash shell] 純文本查看 複製代碼

1 2	bin/hadoop-daemon.sh start datanode bin/hadoop-daemon.sh start jobtracker

Namenode in safe mode
解決方法
bin/hadoop dfsadmin -safemode leave

java.net.NoRouteToHostException: No route to host
j解決方法：

[Bash shell] 純文本查看 複製代碼

1	sudo/etc/init.d/iptablesstop

更改namenode後，在hive中運行select 依舊指向以前的namenode地址
這是由於：

[Bash shell] 純文本查看 複製代碼

          When youcreate a table, hive actually stores the location of the table (e.g. 
        
          hdfs://ip:port/user/root/...)inthe SDS and DBS tablesinthe metastore . So when I bring up a new cluster the master has a new IP, but hive's metastore is still pointing to the locations within the old 
        
          cluster. I could modify the metastore to update with the new IP everytime I bring up a cluster. But the easier and simpler solution was to just use an elastic IPforthe master

因此要將metastore中的以前出現的namenode地址所有更換爲現有的namenode地址

Your DataNodes won't start, and you see something like this in logs/*datanode*:
Incompatible namespaceIDs in /tmp/hadoop-ross/dfs/data
緣由：

[Bash shell] 純文本查看 複製代碼

1	Your Hadoop namespaceID became corrupted. Unfortunately the easiest thing todoreformat the HDFS.

解決方法：
You need to do something like this:

[Bash shell] 純文本查看 複製代碼

          bin/stop-all.sh 
        
          rm-Rf/tmp/hadoop-your-username/* 
        
          bin/hadoopnamenode -format

12：You can run Hadoop jobs written in Java (like the grep example), but your HadoopStreaming jobs (such as the Python example that fetches web page titles) won't work.
緣由：
You might have given only a relative path to the mapper and reducer programs. The tutorial originally just specified relative paths, but absolute paths are required if you are running in a real cluster.
解決方法：
Use absolute paths like this from the tutorial:

[Bash shell] 純文本查看 複製代碼

          bin/hadoopjar contrib/hadoop-0.15.2-streaming.jar / 
        
            -mapper  $HOME/proj/hadoop/multifetch.py         / 
        
            -reducer $HOME/proj/hadoop/reducer.py            / 
        
            -input   urls/*                                  / 
        
            -output  titles

09/08/31 18:25:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:Bad connect ack with firstBadLink 192.168.1.11:50010

[Bash shell] 純文本查看 複製代碼

          > 09/08/3118:25:45 INFO hdfs.DFSClient: Abandoning block blk_-8575812198227241296_1001 
        
          > 09/08/3118:25:51 INFO hdfs.DFSClient: ExceptionincreateBlockOutputStream java.io.IOException: 
        
          Bad connect ack with firstBadLink 192.168.1.16:50010 
        
          > 09/08/3118:25:51 INFO hdfs.DFSClient: Abandoning block blk_-2932256218448902464_1001 
        
          > 09/08/3118:25:57 INFO hdfs.DFSClient: ExceptionincreateBlockOutputStream java.io.IOException: 
        
          Bad connect ack with firstBadLink 192.168.1.11:50010 
        
          > 09/08/3118:25:57 INFO hdfs.DFSClient: Abandoning block blk_-1014449966480421244_1001 
        
          > 09/08/3118:26:03 INFO hdfs.DFSClient: ExceptionincreateBlockOutputStream java.io.IOException: 
        
          Bad connect ack with firstBadLink 192.168.1.16:50010 
        
          > 09/08/3118:26:03 INFO hdfs.DFSClient: Abandoning block blk_7193173823538206978_1001 
        
          > 09/08/3118:26:09 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable 
        
          to create new block. 
        
          >         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2731) 
        
          >         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) 
        
          >         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2182) 
        
          > 
        
          > 09/08/3118:26:09 WARN hdfs.DFSClient: Error Recoveryforblock blk_7193173823538206978_1001 
        
          bad datanode[2] nodes == null 
        
          > 09/08/3118:26:09 WARN hdfs.DFSClient: Could not get block locations. Sourcefile"/user/umer/8GB_input" 
        
          - Aborting... 
        
          > put: Bad connect ack with firstBadLink 192.168.1.16:50010

解決方法：
I have resolved the issue:
What i did:

1) '/etc/init.d/iptables stop' -->stopped firewall
2) SELINUX=disabled in '/etc/selinux/config' file.-->disabled selinux
I worked for me after these two changes

解決jline.ConsoleReader.readLine在Windows上不生效問題方法
在 CliDriver.java的main()函數中，有一條語句reader.readLine，用來讀取標準輸入，但在Windows平臺上該語句老是返回null，這個reader是一個實例jline.ConsoleReader實例，給Windows Eclipse調試帶來不便。
咱們能夠經過使用java.util.Scanner.Scanner來替代它，將原來的
while ((line=reader.readLine(curPrompt+"> ")) != null)
複製代碼
替換爲：
Scanner sc = new Scanner(System.in);
while ((line=sc.nextLine()) != null)

從新編譯發佈，便可正常從標準輸入讀取輸入的SQL語句了。

某次正常運行mapreduce實例時,拋出錯誤

java.io.IOException: All datanodes xxx.xxx.xxx.xxx:xxx are bad. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2158)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
java.io.IOException: Could not get block locations. Aborting…
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
經查明，問題緣由是linux機器打開了過多的文件致使。用命令ulimit -n能夠發現linux默認的文件打開數目爲1024，修改/ect/security/limit.conf，增長hadoop soft 65535

再從新運行程序（最好全部的datanode都修改），問題解決

運行一段時間後hadoop不能stop-all.sh的問題，顯示報錯
no tasktracker to stop ，no datanode to stop
問題的緣由是hadoop在stop的時候依據的是datanode上的mapred和dfs進程號。而默認的進程號保存在/tmp下，linux默認會每隔一段時間（通常是一個月或者7天左右）去刪除這個目錄下的文件。所以刪掉hadoop-hadoop-jobtracker.pid和hadoop- hadoop-namenode.pid兩個文件後，namenode天然就找不到datanode上的這兩個進程了。
在配置文件中的export HADOOP_PID_DIR能夠解決這個問題

問題：
Incompatible namespaceIDs in /usr/local/hadoop/dfs/data: namenode namespaceID = 405233244966; datanode namespaceID = 33333244
緣由：
在每次執行hadoop namenode -format時，都會爲NameNode生成namespaceID,，可是在hadoop.tmp.dir目錄下的DataNode仍是保留上次的 namespaceID，由於namespaceID的不一致，而致使DataNode沒法啓動，因此只要在每次執行hadoop namenode -format以前，先刪除hadoop.tmp.dir目錄就能夠啓動成功。請注意是刪除hadoop.tmp.dir對應的本地目錄，而不是HDFS 目錄。

Problem: NameNode is not formatted
solution: 是由於HDFS尚未格式化，只須要運行hadoop namenode -format一下，而後再啓動便可

bin/hadoop jps後報以下異常：

[Bash shell] 純文本查看 複製代碼

          Exceptioninthread"main"java.lang.NullPointerException 
        
                  at sun.jvmstat.perfdata.monitor.protocol.local.LocalVmManager.activeVms(LocalVmManager.java:127) 
        
                  at sun.jvmstat.perfdata.monitor.protocol.local.MonitoredHostProvider.activeVms(MonitoredHostProvider.java:133) 
        
                  at sun.tools.jps.Jps.main(Jps.java:45)

緣由爲：
系統根目錄/tmp文件夾被刪除了。從新創建/tmp文件夾便可。
bin/hive中出現 unable to create log directory /tmp/...也多是這個緣由

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。