fs.FSInputChecker: Found checksum error

在進行hdfs文件上傳時,遇到以下錯誤:java

KejetLogETL ... t=(16:28:51)
12/07/06 16:28:52 INFO fs.FSInputChecker: Found checksum error: b[0, 512]=30303030303030303030313009353335330a3030303337463846444644
41093331360a303030384341383341413644093633390a303031314438303132333435093232370a303031333737414245414131093239380a303031353631353232
324437093231320a303031364536464536394239093331370a303031453130314632343633093235370a303031453130314633353334093335350a30303145313031
4636433436093335330a303031453130314641314635093937310a303031463343423930433236093431310a303032364337373938304441093330380a3030453034
43333630303143093239330a303045303443333630303238093238300a303045303443333630303341093336340a303045303443333630303436093339320a303045
303443333630303634093238320a303045303443333630303832093239340a303045303443333630313039093230380a303045303443333630313845093230380a30
3045303443333632334633093339390a303045303443393739414144093334350a303045303632303341344630093238310a30304530363232333135353009323134
0a343438374643463642433234093230320a36303530343033303230313009313130370a363841334334413042363336093438320a36434630343938323431453509
3231300a373035414236354633434136093232340a
org.apache.Hadoop.fs.ChecksumException: Checksum error: file:/hadoop-disk9/muse/0.1.0/kejet_stat/files/bad_macs at 0
at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.Java:277)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:163)
at org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:627)
at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:713)
at org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:655)
at org.apache.hadoop.mapred.JobClient.access$300(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:865)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at com.funshion.muse.etl.KejetLogETL.run(KejetLogETL.java:181)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.funshion.muse.etl.KejetLogETL.main(KejetLogETL.java:198)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
12/07/06 16:28:52 INFO mapred.JobClient: Cleaning up the staging area hdfs://muse0:8020/home/hadoop/tmp/mapred/staging/hadoop/.stagi
ng/job_201206211535_0461
12/07/06 16:28:52 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:org.apache.hadoop.fs.ChecksumExcep
tion: Checksum error: file:/hadoop-disk9/muse/0.1.0/kejet_stat/files/bad_macs at 0apache

所以這個錯誤的緣由以下:oop

HADOOP中的CRC數據校驗文件url

Hadoop系統爲了保證數據的一致性,會對文件生成相應的校驗文件,並在讀寫的時候進行校驗,確保數據的準確性。
好比咱們遇到的這個Case:
執行的命令:
hadoop jar dw-hadoop-2010_7_23.jar jobDriver -files tb_steps_url_path_dim.txt multisteps_output 2011-01-25
出錯日誌的提示:
org.apache.hadoop.fs.ChecksumException: Checksum error: file:tb_steps_url_path_dim.txt at 0
at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
at org.apache.hadoop.mapred.JobClient.copyRemoteFiles(JobClient.java:565)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:627)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:802)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1290)
at jobDriver.run(jobDriver.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at jobDriver.main(jobDriver.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
錯誤緣由:
啓動任務的命令中包含一個參數「-files tb_steps_url_path_dim.txt」
Hadoop客戶端須要將機器本地磁盤中的tb_steps_url_path_dim.txt文件上傳到DFS中。
在上傳的過程當中,Hadoop將經過FSInputChecker判斷須要上傳的文件是否存在進行校驗的crc文件,即.tb_steps_url_path_dim.txt.crc,若是存在crc文件,將會對其內容一致性進行校驗,若是校驗 失敗,則中止上傳該文件。最終致使整個MR任務沒法執行。
crc文件來源
DFS命令:hadoop fs -getmerge srcDir destFile
這類命令在執行的時候,會將srcDir目錄下的全部文件合併成一個文件,保存在destFile中,同時會在本地磁盤生成一個. destFile.crc的校驗文件。
DFS命令:hadoop fs -get -crc src dest
這類命令在執行的時候,會將src文件,保存在dest中,同時會在本地磁盤生成一個. dest.crc的校驗文件。
如何避免
在使用hadoop fs -getmerge srcDir destFile命令時,本地磁盤必定會(沒有參數能夠關閉)生成相應的.crc文件。
因此若是須要修改getmerge獲取的文件的內容,再次上傳到DFS時,能夠採起如下2種策略進行規避:
1. 刪除.crc文件
2. 將getmerge獲取的文件修改後從新命名,如使用mv操做,再次上傳到DFS中。.net

相關文章
相關標籤/搜索