Flume 遠程寫HDFS

 如今的需求是在一臺Flume採集機器上,往Hadoop集羣上寫HDFS,該機器沒有安裝Hadoop。java

這裏的Flume版本是1.6.0,Hadoop版本是2.7.1.node

把Hadoop集羣的hdfs-site.xml、core-site.xml兩個配置文件複製到 flume安裝目錄的conf目錄去,把hadoop-hdfs-2.7.1.jar複製到 Flume  lib目錄。apache

1、Flume配置文件: tcp

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type = syslogtcp
a1.sources.r1.bind = 192.168.110.160 # 本機ip
a1.sources.r1.port = 23003
a1.sources.r1.workerThreads  = 10


a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100000
a1.channels.c1.keep-alive = 6
a1.channels.c1.byteCapacityBufferPercentage = 20


a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://clusterpc/test/flume/%y-%m-%d
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp=true

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

  啓動: bin/flume-ng agent --conf conf --conf-file conf/flume-tcp-memory-hdfs.conf --name a1 -Dflume.root.logger=info,consoleide

 2、錯誤集:oop

一、 找不到主機名ui

2016-09-19 16:15:48,518 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.IllegalArgumentException: java.net.UnknownHostException: cluster
        at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.UnknownHostException: cluster

  cluster是公司Hadoop集羣NameService的名字,這個錯誤是因爲找不到Hadoop集羣NameService形成的,因此須要把hdfs-site.xml複製到flume/conf目錄。google

二、spa

java.io.IOException: Mkdirs failed to create /test/flume/16-09-19 (exists=false, cwd=file:/data/apache-flume-1.6.0-bin)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:450)
        at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
        at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
        at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:78)
        at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:69)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:246)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

  把 core-site.xml複製到flume/conf目錄.net

3

java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

  把hadoop-hdfs-2.7.1.jar複製到flume/lib目錄下

四、HDFS權限不足,這裏往HDFS寫文件的用戶是登陸Flume採集機器的用戶。

org.apache.hadoop.security.AccessControlException: Permission denied: user=kafka, access=WRITE, inode="/test/flume/16-09-19/events-.1474268726127.tmp":hadoop:supergroup:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1698)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1682)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1665)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2517)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2452)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

  HDFS 權限不足,要受權。hadoop fs -chmod -R 777 /test/

五、時間戳

java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:228)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:432)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:380)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:744)

  緣由是Event對象headers沒有設置timestamp形成的,解決辦法:設置a1.sinks.k1.hdfs.useLocalTimeStamp=true,使用本地時間戳。

相關文章
相關標籤/搜索