3.馬士兵_hadoop入門

時間 2019-12-07

標籤士兵 hadoop 入門欄目 Hadoop 简体版

原文原文鏈接

1.存儲模塊（hadoop），資源調度模塊(yarn)，計算引擎(mapreduce)java

2.hdfs,當作一個由不少機器組成的大硬盤。支持動態擴展，動態增減。配置core-site.xml;slave文件記錄了管理那些datanode，namenode能夠集中管理。node

3.本次是用java程序訪問hdf，360，百度網盤。linux

4.若是機器跑不了那麼多，就用僞分佈式結構。apache

5. jps,hdfs dfsadmin -report，能夠查看集羣的啓動狀況。windows

6.hadoop默認存的路徑是/tmp，若是沒有修改過的話，linux重啓不定時的會清除這個目錄。有可能形成不正常，因此要進行必定的修改。hdfs namenode -formate。start[stop]-dfs.sh。分佈式

7.用程序訪問hdfs。 oop

a.URL.大數據

b.得到內容的簡單方法：雲計算

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
       URL url = new URL("hdfs://192.168.56.100:9000/hello.txt");
       InputStream in = url.openStream();
       IOUtils.copyBytes(in, System.out, 1024, true);url

c.建立寫入過程可能有用戶權限問題。是由於：vi hdfs-site.xml中配置了權限的檢查，內容，關閉檢查：

<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>修改後只要重啓namenode便可，若是重啓集羣，那生產環境的代價就太大了。另外說一句，delete文件默認只是放入垃圾堆中。
代碼示例，基本的核心代碼原理基本的baidu網盤就實現了：

import java.io.FileInputStream;
import java.io.InputStream;
import java.net.URL;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class HelloHDFS {

   public static void main(String[] args) throws Exception {
       /*URL url = new URL("http://www.baidu.com");
       InputStream in = url.openStream();
       IOUtils.copyBytes(in, System.out, 1024, true);*/
       /*URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
       URL url = new URL("hdfs://192.168.56.100:9000/hello.txt");
       InputStream in = url.openStream();
       IOUtils.copyBytes(in, System.out, 1024, true);*/
       Configuration conf = new Configuration();
       conf.set("fs.defaultFS", "hdfs://192.168.56.100:9000");
       FileSystem fileSys = FileSystem.get(conf);
       //這裏能執行不少常規增刪改查功能

       boolean suc = fileSys.mkdirs(new Path("/gxl"));
       System.out.println(suc);

       suc = fileSys.exists(new Path("/gxl"));
       System.out.println(suc);

//       suc = fileSys.delete(new Path("/gxl"),true);
//       System.out.println(suc);

       suc = fileSys.exists(new Path("/gxl"));
       System.out.println(suc);

       //上傳windows文件
       FSDataOutputStream out = fileSys.create(new Path("/test.data"), true);
       FileInputStream in = new FileInputStream("F:/BaiduNetdiskDownload/Xftp.exe");
//       IOUtils.copyBytes(in, out, 4096,true);

       byte[] buf = new byte[4096];
       int len = in.read(buf);
       while(len != -1){
           out.write(buf,0,len);
           len = in.read(buf);
       }
       in.close();
       out.close();

       //讀信息
       FileStatus[] fstatus = fileSys.listStatus(new Path("/"));
       for(FileStatus status:fstatus){
           System.out.println(status.getPath());
           System.out.println(status.getPermission());
           System.out.println(status.getReplication());
       }
   }
}
8.其實java寫hdfs仍是比較簡單的，只不過實際中用的不多，有了hive，pig後，mapreduce也不多了。