hadoop節點地址localhost問題

問題描述

hadoop集羣安裝完畢,在yarn的控制檯顯示節點id和節點地址都是localhostjava

hadoop@master sbin]$ yarn node -list
20/12/17 12:21:19 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.8.42:18040
Total Nodes:1
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
 localhost:43141                RUNNING    localhost:8042                                  0

提交做業時在yarn的日誌中也打印出節點信息爲127.0.0.1,而且使用該ip做爲節點IP,確定鏈接出錯node

2020-12-17 00:53:30,721 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1607916354082_0008_01_000001, AllocationRequestId: 0, Version: 0, NodeId: localhost:43141, NodeHttpAddress: localhost:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 127.0.0.1:35845 }, ExecutionType: GUARANTEED, ] for AM appattempt_1607916354082_0008_000001

020-12-17 00:56:30,801 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1607916354082_0008_000001. Got exception: java.net.ConnectException: Call From master/172.16.8.42 to localhost:43141 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
       at sun.reflect.GeneratedConstructorAccessor46.newInstance(Unknown Source)
       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:827)
       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:757)
       at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1553)
       at org.apache.hadoop.ipc.Client.call(Client.java:1495)
       at org.apache.hadoop.ipc.Client.call(Client.java:1394)
       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
       at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)

問題緣由

在hadoop的源碼中,獲取節點信息的代碼以下apache

private NodeId buildNodeId(InetSocketAddress connectAddress,String hostOverride) {
       if (hostOverride != null) {
           connectAddress = NetUtils.getConnectAddress(
                   new InetSocketAddress(hostOverride, connectAddress.getPort()));
       }
       return NodeId.newInstance(
               connectAddress.getAddress().getCanonicalHostName(),
               connectAddress.getPort());
   }

其中主機名是經過connectAddress.getAddress().getCanonicalHostName()進行獲取,咱們知道獲取主機名還能夠經過getHostName獲取,那麼這兩種有什麼區別?getCanonicalHostName獲取的是全域名,getHostName獲取的是主機名,好比主機名是definesys但可能dns上面配的域名是definesys.com,getCanonicalHostName就是經過dns進行解析獲取全域名,實際上getAddress獲取到的是127.0.0.1,在hosts文件中是這樣配置的bash

127.0.0.1     localhost       localhost.localdomain

所以解析成了localhostapp

解決方案

在hadoop的推薦方案裏是這麼寫的dom

  • If the error message says the remote service is on "127.0.0.1" or "localhost" that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken.
  • Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this).

翻譯過來是建議刪除127.0.0.1 和 127.0.1.1在hosts中的配置,刪除後恢復正常,問題解決。ide

相關文章
相關標籤/搜索