Spring Hadoop官網地址以下:html
Spring Hadoop簡化了Apache Hadoop,提供了一個統一的配置模型以及簡單易用的API來使用HDFS、MapReduce、Pig以及Hive。還集成了其它Spring生態系統項目,如Spring Integration和Spring Batch.。spring
特色:shell
Spring Hadoop2.5的官方文檔及API地址:apache
https://docs.spring.io/spring-hadoop/docs/2.5.0.RELEASE/reference/html/
https://docs.spring.io/spring-hadoop/docs/2.5.0.RELEASE/api/api
建立一個maven工程,配置依賴以下:安全
<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <hadoop.version>2.6.0-cdh5.7.0</hadoop.version> </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> <scope>provided</scope> </dependency> <!-- 添加UserAgent解析的依賴 --> <dependency> <groupId>com.kumkee</groupId> <artifactId>UserAgentParser</artifactId> <version>0.0.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.10</version> <scope>test</scope> </dependency> <!-- 添加Spring Hadoop的依賴 --> <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-hadoop</artifactId> <version>2.5.0.RELEASE</version> </dependency> </dependencies> <!-- mvn assembly:assembly --> <build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass></mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build> </project>
在工程中建立resource目錄以及配置文件,配置文件的名能夠自定義,配置文件中增長以下內容:bash
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hdp="http://www.springframework.org/schema/hadoop" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd"> <!-- 加載屬性文件 --> <context:property-placeholder location="application.properties"/> <hdp:configuration id="hadoopConfiguration"> <!-- 服務器的url --> fs.defaultFS=${spring.hadoop.fsUri} </hdp:configuration> <!-- 裝配文件系統bean以及操做用戶 --> <hdp:file-system id="fileSystem" configuration-ref="hadoopConfiguration" user="root"/> </beans>
而後再建立一個屬性文件application.properties(文件名稱可自定義),把一些容易改變的配置信息配置在屬性文件下,例如我這裏是將服務器的url配置在屬性文件裏,內容以下:服務器
spring.hadoop.fsUri=hdfs://192.168.77.128:8020
完成以上操做以後,咱們的Spring Hadoop開發環境就算是搭建完成了,畢竟使用Maven就是方便。app
接下來咱們來建立個測試類,測試一下是否可以正常對HDFS文件系統進行操做:
package org.zero01.spring; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.junit.After; import org.junit.Before; import org.junit.Test; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import java.io.IOException; /** * @program: hadoop-train * @description: 使用Spring Hadoop來訪問HDFS文件系統 * @author: 01 * @create: 2018-04-04 17:39 **/ public class SpringHadoopApp { private ApplicationContext ctx; private FileSystem fileSystem; @Before public void setUp() { ctx = new ClassPathXmlApplicationContext("beans.xml"); fileSystem = (FileSystem) ctx.getBean("fileSystem"); } @After public void tearDown() throws IOException { ctx = null; fileSystem.close(); } /** * 在HDFS上建立一個目錄 * @throws Exception */ @Test public void testMkdirs()throws Exception{ fileSystem.mkdirs(new Path("/SpringHDFS/")); } }
以上的代碼是執行成功的,而後到服務器上查看一下根目錄下是否有SpringHDFS這個目錄:
[root@hadoop000 ~]# hdfs dfs -ls / Found 7 items -rw-r--r-- 3 root supergroup 2769741 2018-04-02 21:13 /10000_access.log drwxr-xr-x - root supergroup 0 2018-04-04 17:50 /SpringHDFS drwxr-xr-x - root supergroup 0 2018-04-02 21:22 /browserout drwxr-xr-x - root supergroup 0 2018-04-02 20:29 /data drwxr-xr-x - root supergroup 0 2018-04-02 20:31 /logs drwx------ - root supergroup 0 2018-04-02 20:39 /tmp drwxr-xr-x - root supergroup 0 2018-04-02 20:39 /user [root@hadoop000 ~]# hdfs dfs -ls /SpringHDFS [root@hadoop000 ~]#
能夠看到SpringHDFS目錄已經成功被建立了,這就表明咱們配置的工程沒有問題。
既然建立目錄沒有問題,咱們就再來寫一個測試方法,用來讀取HDFS上某個文件的內容,代碼以下:
/** * 讀取HDFS上的文件內容 * @throws Exception */ @Test public void testText()throws Exception{ FSDataInputStream in = fileSystem.open(new Path("/browserout/part-r-00000")); IOUtils.copyBytes(in, System.out, 1024); in.close(); }
以上的代碼執行成功,控制檯輸出結果以下:
Chrome 2775 Firefox 327 MSIE 78 Safari 115 Unknown 6705
讀和寫都沒有問題了,這下就能愉快的在工程裏使用Spring Hadoop簡化咱們的開發了。
以上介紹了Spring Hadoop訪問HDFS,接下來再簡單介紹一下使用Spring Boot訪問HDFS,使用Spring Boot會更加簡單。
首先須要在pom.xml文件中,加入Spring Boot的依賴:
<!-- 添加Spring Boot的依賴 --> <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-hadoop-boot</artifactId> <version>2.5.0.RELEASE</version> </dependency>
package org.zero01.spring; import org.apache.hadoop.fs.FileStatus; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.CommandLineRunner; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.SpringBootApplication; import org.springframework.data.hadoop.fs.FsShell; /** * @program: hadoop-train * @description: 使用spring boot來訪問HDFS * @author: 01 * @create: 2018-04-04 18:45 **/ @SpringBootApplication public class SpringBootHDFSApp implements CommandLineRunner { @Autowired FsShell fsShell; // 用於執行hdfs shell命令的對象 public void run(String... strings) throws Exception { // 查看根目錄下的全部文件 for (FileStatus fileStatus : fsShell.ls("/")) { System.out.println("> " + fileStatus.getPath()); } } public static void main(String[] args) { SpringApplication.run(SpringBootHDFSApp.class, args); } }
控制檯輸出以下:
> hdfs://192.168.77.128:8020/ > hdfs://192.168.77.128:8020/10000_access.log > hdfs://192.168.77.128:8020/SpringHDFS > hdfs://192.168.77.128:8020/browserout > hdfs://192.168.77.128:8020/data > hdfs://192.168.77.128:8020/logs > hdfs://192.168.77.128:8020/tmp > hdfs://192.168.77.128:8020/user