外網沒法訪問雲主機HDFS文件系統

時間 2019-11-07

標籤沒法訪問主機 hdfs 文件系統欄目 Hadoop 简体版

原文原文鏈接

1、問題背景：
1.雲主機是 Linux 環境，搭建 Hadoop 僞分佈式
公網 IP：139.198.18.xxx
內網 IP：192.168.137.2
主機名：hadoop001
2.本地的core-site.xml配置以下：java

<configuration>
<property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:9001</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>hdfs://hadoop001:9001/hadoop/tmp</value>
</property>
</configuration>

3.本地的hdfs-site.xml配置以下：node

<configuration>
<property>
       <name>dfs.replication</name>
       <value>1</value>
 </property>
</configuration>

4.雲主機hosts文件配置：apache

[hadoop@hadoop001 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# hostname loopback address
  192.168.137.2   hadoop001

雲主機將內網IP和主機名hadoop001作了映射
5.本地hosts文件配置瀏覽器

139.198.18.XXX     hadoop001

本地已經將公網IP和域名hadoop001作了映射
2、問題症狀
1.在雲主機上開啓 HDFS，JPS 查看進程都沒有異常，經過 Shell 操做 HDFS 文件也沒有問題
2.經過瀏覽器訪問 50070 端口管理界面也沒有問題
3.在本地機器上使用 Java API 操做遠程 HDFS 文件，URI 使用公網 IP，代碼以下：服務器

val uri = new URI("hdfs://hadoop001:9001")
val fs = FileSystem.get(uri,conf)
val listfiles = fs.listFiles(new Path("/data"),true)
    while (listfiles.hasNext) {
    val nextfile = listfiles.next()
    println("get file path:" + nextfile.getPath().toString())
    }
------------------------------運行結果---------------------------------
get file path:hdfs://hadoop001:9001/data/infos.txt

4.在本地機器使用SparkSQL讀取hdfs上的文件並轉換爲DF的過程當中app

object SparkSQLApp {
  def main(args: Array[String]): Unit = {
  val spark = SparkSession.builder().appName("SparkSQLApp").master("local[2]").getOrCreate()
  val info = spark.sparkContext.textFile("/data/infos.txt")
  import spark.implicits._
  val infoDF = info.map(_.split(",")).map(x=>Info(x(0).toInt,x(1),x(2).toInt)).toDF()
  infoDF.show()
  spark.stop()
  }
  case class Info(id:Int,name:String,age:Int)
}

出現以下報錯信息：dom

....
....
....
19/02/23 16:07:00 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
19/02/23 16:07:00 INFO HadoopRDD: Input split: hdfs://hadoop001:9001/data/infos.txt:0+17
19/02/23 16:07:21 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
.....
....
19/02/23 16:07:21 INFO DFSClient: Could not obtain BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 from any node: java.io.IOException: No live nodes contain block BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 after checking nodes = [DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Will get new block locations from namenode and retry...
19/02/23 16:07:21 WARN DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 272.617680460432 msec.
19/02/23 16:07:42 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
...
...
19/02/23 16:07:42 WARN DFSClient: Failed to connect to /192.168.137.2:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3499)
...
...
19/02/23 16:08:12 WARN DFSClient: Failed to connect to /192.168.137.2:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
...
...
19/02/23 16:08:12 INFO DFSClient: Could not obtain BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 from any node: java.io.IOException: No live nodes contain block BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 after checking nodes = [DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Will get new block locations from namenode and retry...
19/02/23 16:08:12 WARN DFSClient: DFS chooseDataNode: got # 3 IOException, will wait for 11918.913311370841 msec.
19/02/23 16:08:45 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
...
...
19/02/23 16:08:45 WARN DFSClient: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Throwing a BlockMissingException
19/02/23 16:08:45 WARN DFSClient: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt No live nodes contain current block Block locations: DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK] Dead nodes:  DatanodeInfoWithStorage[192.168.137.2:50010,DS-fb2e7244-165e-41a5-80fc-4bb90ae2c8cd,DISK]. Throwing a BlockMissingException
19/02/23 16:08:45 WARN DFSClient: DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
...
...
19/02/23 16:08:45 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:648)
...
...
19/02/23 16:08:45 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
19/02/23 16:08:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
19/02/23 16:08:45 INFO TaskSchedulerImpl: Cancelling stage 0
19/02/23 16:08:45 INFO DAGScheduler: ResultStage 0 (show at SparkSQLApp.scala:30) failed in 105.618 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
...
...
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1358284489-192.168.137.2-1550394746448:blk_1073741840_1016 file=/data/infos.txt
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1001)
...
...

3、問題分析
1.本地 Shell 能夠正常操做，排除集羣搭建和進程沒有啓動的問題
2.雲主機沒有設置防火牆，排除防火牆沒關的問題
3.雲服務器防火牆開放了 DataNode 用於數據傳輸服務端口默認是 50010
4.我在本地搭建了另外一臺虛擬機，該虛擬機和本地在同一局域網，本地能夠正常操做該虛擬機的hdfs，基本肯定了是因爲內外網的緣由。
5.查閱資料發現 HDFS 中的文件夾和文件名都是存放在 NameNode 上，操做不須要和 DataNode 通訊，所以能夠正常建立文件夾和建立文件說明本地和遠程 NameNode 通訊沒有問題。那麼極可能是本地和遠程 DataNode 通訊有問題
4、問題猜測
因爲本地測試和雲主機不在一個局域網，hadoop配置文件是之內網ip做爲機器間通訊的ip。在這種狀況下,咱們可以訪問到namenode機器，namenode會給咱們數據所在機器的ip地址供咱們訪問數據傳輸服務，可是當寫數據的時候，NameNode 和DataNode 是經過內網通訊的，返回的是datanode內網的ip,咱們沒法根據該IP訪問datanode服務器。
咱們來看一下其中一部分報錯信息：分佈式

19/02/23 16:07:21 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.net.ConnectException: Connection timed out: no further information
...
19/02/23 16:07:42 WARN DFSClient: Failed to connect to /192.168.137.2:50010 for block, add to deadNodes and continue....

從報錯信息中能夠看出，鏈接不到192.168.137.2:50010，也就是datanode的地址，由於外網必須訪問「139.198.18.XXX:50010」才能訪問到datanode。
爲了可以讓開發機器訪問到hdfs，咱們能夠經過域名訪問hdfs，讓namenode返回給咱們datanode的域名。
5、問題解決
1.嘗試一：
在開發機器的hosts文件中配置datanode對應的外網ip和域名（上文已經配置），而且在與hdfs交互的程序中添加以下代碼:oop

val conf = new Configuration()
conf.set("dfs.client.use.datanode.hostname", "true")

報錯依舊
2.嘗試二：測試

val spark = SparkSession
      .builder()
      .appName("SparkSQLApp")
       .master("local[2]")
      .config("dfs.client.use.datanode.hostname", "true")
      .getOrCreate()

報錯依舊
3.嘗試三：
在hdfs-site.xml中添加以下配置：

<property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>

運行成功
經過查閱資料，建議在hdfs-site.xml中增長dfs.datanode.
use.datanode.hostname屬性，表示datanode之間的通訊也經過域名方式

<property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>true</value>
    </property>

這樣可以使得更換內網IP變得十分簡單、方便，並且可讓特定datanode間的數據交換變得更容易。但與此同時也存在一個反作用，當DNS解析失敗時會致使整個Hadoop不能正常工做，因此要保證DNS的可靠

總結：將默認的經過IP訪問，改成經過域名方式訪問。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。