分佈式文件系統是爲了讓文件多副本存儲,當某個節點癱瘓,在另外的節點能夠訪問到副本,提升系統可靠性。這是傳統的設計方法。但也存在缺點:html
1)無論文件多大,都存儲在一個節點上,在進行數據處理的時候很難進行並行處理,節點成爲網絡瓶頸,很難進行大數據處理;java
2)存儲負載不均衡,每一個節點利用率很低node
什麼是HDFS?web
HDFS的設計目標apache
架構圖:api
一個文件會被拆分紅多個Block瀏覽器
blocksize:128M網絡
130M==>2個Block:128M 和 2M架構
NN:app
1)負責客戶端請求的響應
2)元數據的管理
DN:
1)存儲用戶的文件對應的數據塊(Block)
2)要按期向NN發送心跳信息,彙報自己及其全部的block信息,健康情況
一個典型的部署架構是運行一個NameNode節點,集羣裏每個其餘機器運行一個DataNode節點。
實際生產環境中建議:NameNode、DataNode部署在不一樣節點上。
環境:Centos7
1.jdk安裝
省略
2.安裝SSH
sudo yum install ssh
ssh-keygen -t rsa
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
3.安裝hadoop
1)官網下載,我選擇的版本是第三方商業化版本cdh,hadoop-2.6.0-cdh5.7.0。
2)解壓 tar -zxvf hadoop-2.6.0-cdh5.7.0.tar.gz -C ~/app/
4.配置文件修改
etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.56.102:8020</value> </property>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/app/tmp</value>
</property>
</configuration>
hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
5.啓動hdfs
格式化(第一次執行便可):
cd bin
./hadoop namenode -format
cd sbin
./start-dfs.sh
驗證是否成功:
jps
Jps
SecondaryNameNode
DataNode
NameNode
或者瀏覽器驗證:http://192.168.56.102:50070
6.中止hdfs
cd sbin
./stop-dfs.sh
三.Java API操做HDFS文件
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.imooc.hadoop</groupId> <artifactId>hadoop-train</artifactId> <version>1.0</version> <name>hadoop-train</name> <!-- FIXME change it to the project's website --> <url>http://www.example.com</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <hadoop.version>2.6.0-cdh5.7.0</hadoop.version> </properties> <repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> </dependencies> <build> <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) --> <plugins> <plugin> <artifactId>maven-clean-plugin</artifactId> <version>3.0.0</version> </plugin> <!-- see http://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging --> <plugin> <artifactId>maven-resources-plugin</artifactId> <version>3.0.2</version> </plugin> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.7.0</version> </plugin> <plugin> <artifactId>maven-surefire-plugin</artifactId> <version>2.20.1</version> </plugin> <plugin> <artifactId>maven-jar-plugin</artifactId> <version>3.0.2</version> </plugin> <plugin> <artifactId>maven-install-plugin</artifactId> <version>2.5.2</version> </plugin> <plugin> <artifactId>maven-deploy-plugin</artifactId> <version>2.8.2</version> </plugin> </plugins> </pluginManagement> </build> </project>
package com.cracker.hadoop.hdfs; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.util.Progressable; import org.junit.After; import org.junit.Before; import org.junit.Test; import javax.swing.*; import java.io.BufferedInputStream; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.net.URI; /** * Hadoop HDFS Java API 操做 */ public class HDFSApp { public static final String HDFS_PATH = "hdfs://192.168.56.102:8020"; FileSystem fileSystem = null; Configuration configuration = null; /** * 建立HDFS目錄 * @throws Exception */ @Test public void mkdir() throws Exception { fileSystem.mkdirs(new Path("/hdfsapi/test")); } /** * 建立文件 */ @Test public void create() throws Exception { FSDataOutputStream output = fileSystem.create(new Path("/hdfsapi/test/a.txt")); output.write("hello hadoop".getBytes()); output.flush(); output.close(); } /** * 查看HDFS文件上的內容 */ @Test public void cat() throws Exception { FSDataInputStream in = fileSystem.open(new Path("/hdfsapi/test/a.txt")); IOUtils.copyBytes(in, System.out, 1024); in.close(); } /** * 重命名 */ @Test public void rename() throws Exception { Path oldPath = new Path("/hdfsapi/test/a.txt"); Path newPath = new Path("/hdfsapi/test/b.txt"); fileSystem.rename(oldPath, newPath); } /** * 上傳文件到HDFS */ @Test public void copyFromLocalFile() throws Exception { Path localPath = new Path("/Users/chen/Downloads/hello2.txt"); Path hdfsPath = new Path("/hdfsapi/test"); fileSystem.copyFromLocalFile(localPath, hdfsPath); } /** * 上傳文件到HDFS */ @Test public void copyFromLocalFileWithProgress() throws Exception { Path localPath = new Path("/Users/chen/Downloads/hello2.txt"); Path hdfsPath = new Path("/hdfsapi/test"); fileSystem.copyFromLocalFile(localPath, hdfsPath); InputStream in = new BufferedInputStream( new FileInputStream( new File("/Users/chen/Downloads/hive.tar.gz"))); FSDataOutputStream output = fileSystem.create(new Path("/hdfsapi/test/hive1.0.tar.gz"), new Progressable() { public void progress() { System.out.print("."); } }); IOUtils.copyBytes(in, output, 4096); } public void copyToLocalFile() throws Exception { Path localPath = new Path("/Users/chen/Downloads/h.txt"); Path hdfsPath = new Path("/hdfsapi/test/b.txt"); fileSystem.copyToLocalFile(hdfsPath, localPath); } /** * 查看某個目錄下全部文件 */ @Test public void listFiles() throws Exception { FileStatus[] fileStatuses = fileSystem.listStatus(new Path("/hdfsapi/test")); for (FileStatus fileStatus : fileStatuses) { String isDir = fileStatus.isDirectory()?"文件夾" : "文件"; short replication = fileStatus.getReplication(); long len = fileStatus.getLen(); String path = fileStatus.getPath().toString(); System.out.println(isDir + "\t" + replication + "\t" + len + "\t" +path); } } @Test public void delete() throws Exception { fileSystem.delete(new Path("/hdfsapi/test")); } @Before public void setUp() throws Exception{ System.out.println("HDFSApp.setUp"); configuration =new Configuration(); fileSystem = FileSystem.get(new URI(HDFS_PATH), configuration, "root"); } @After public void tearDown() throws Exception{ configuration = null; fileSystem = null; System.out.println("HDFSApp.tearDown"); } }