windows eclipse直接訪問遠程linux hadoop開發環境配置（符合實際開發的作法） CentOS 7離線安裝CDH 5.16.1徹底指南（含各類錯誤處理）

時間 2019-12-08

標籤 windows eclipse 直接訪問遠程 linux hadoop 開發環境配置符合實際開發作法 centos 離線安裝 cdh 5.16.1 徹底指南各類錯誤處理欄目 Windows 简体版

原文原文鏈接

CDH 5.x搭建請參考CentOS 7離線安裝CDH 5.16.1徹底指南（含各類錯誤處理）。html

若是使用的是cloudera quickstart vm，則只能在linux服務器中使用eclipse提交，沒法遠程訪問（主要是quickstart綁定的全部ip都是localhost所致，因此最好仍是本身搭建一個單機的hadoop環境）。java

安裝包下載node

hadoop-2.6.5.tar.gz（最好是和服務器版本保持一致，避免出現各類版本不匹配致使的接口不匹配問題）linux

解壓express

hadoop.dll-and-winutils.exe-for-hadoop2.7.3-on-windows_X64-master.zipapache

解壓，拷貝到hadoop/bin目錄，以下：windows

拷貝hadoop.dll到c:\windows\system32目錄。服務器

hadoop-eclipse-plugin-2.6.0.jarsession

拷貝到eclispe/plugins目錄，以下：app

eclipse開發環境配置

拷貝hadoop-2.6.5\etc\hadoop目錄下的log4j.properties和core-site.xml到項目resources目錄，以下：

core-site.xml中的內容以下：

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.223.150:8020</value>
    </property>
</configuration>

其實只要增長hdfs位置便可。注：若是在代碼中寫死服務器地址的話，這個配置文件是可選的。

指定項目的外部庫，以下：

maven依賴配置：

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.6.0</version>
        </dependency>

parquet依賴：

            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet</artifactId>
            <version>1.8.1</version>
            <type>pom</type>
        </dependency>
        <dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-common</artifactId>
            <version>1.8.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-encoding</artifactId>
            <version>1.8.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-column</artifactId>
            <version>1.8.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-hadoop</artifactId>
            <version>1.8.1</version>
        </dependency>

hadoop location配置：

上述配置完成後，就能夠在本地開發hadoop服務，直接提交到遠程HDFS執行了，以下：

package hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.parquet.example.data.Group;
import org.apache.parquet.example.data.simple.SimpleGroupFactory;
import org.apache.parquet.hadoop.ParquetOutputFormat;
import org.apache.parquet.hadoop.example.GroupWriteSupport;
import java.io.IOException;
import java.util.Random;
import java.util.StringTokenizer;
import java.util.UUID;
/**
 * 

* <p>Title: ParquetNewMR</p>  

* <p>Description: </p>  

* @author zjhua

* @date 2019年4月7日
 */
public class ParquetNewMR {
 
    public static class WordCountMap extends
            Mapper<LongWritable, Text, Text, IntWritable> {
 
        private final IntWritable one = new IntWritable(1);
        private Text word = new Text();
        @Override
        public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer token = new StringTokenizer(line);
            while (token.hasMoreTokens()) {
                word.set(token.nextToken());
                context.write(word, one);
            }
        }
    }
 
    public static class WordCountReduce extends
            Reducer<Text, IntWritable, Void, Group> {
        private SimpleGroupFactory factory;
        @Override
        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            Group group = factory.newGroup()
                    .append("name",  key.toString())
                    .append("age", sum);
            context.write(null,group);
        }
 
        @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            super.setup(context);
            factory = new SimpleGroupFactory(GroupWriteSupport.getSchema(context.getConfiguration()));
 
        }
    }
 
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String writeSchema = "message example {\n" +
                "required binary name;\n" +
                "required int32 age;\n" +
                "}";
        conf.set("parquet.example.schema",writeSchema);
//        conf.set("dfs.client.use.datanode.hostname", "true");
 
        Job job = new Job(conf);
        job.setJarByClass(ParquetNewMR.class);
        job.setJobName("parquet");
 
        String in = "hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input";
        String out = "hdfs://192.168.223.150:8020/user/hadoop1/pq_out_" + UUID.randomUUID().toString();
 
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
 
        job.setOutputValueClass(Group.class);
 
        job.setMapperClass(WordCountMap.class);
        job.setReducerClass(WordCountReduce.class);
 
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(ParquetOutputFormat.class);
 
        FileInputFormat.addInputPath(job, new Path(in));
        ParquetOutputFormat.setOutputPath(job, new Path(out));
        ParquetOutputFormat.setWriteSupportClass(job, GroupWriteSupport.class);
 
        job.waitForCompletion(true);
    }
}

輸出以下：

19/04/20 13:15:12 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/04/20 13:15:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/04/20 13:15:13 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/04/20 13:15:13 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
19/04/20 13:15:13 INFO input.FileInputFormat: Total input paths to process : 3
19/04/20 13:15:13 INFO mapreduce.JobSubmitter: number of splits:3
19/04/20 13:15:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local496876089_0001
19/04/20 13:15:13 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/04/20 13:15:13 INFO mapreduce.Job: Running job: job_local496876089_0001
19/04/20 13:15:13 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/04/20 13:15:13 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.parquet.hadoop.ParquetOutputCommitter
19/04/20 13:15:13 INFO mapred.LocalJobRunner: Waiting for map tasks
19/04/20 13:15:13 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_m_000000_0
19/04/20 13:15:13 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
19/04/20 13:15:13 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@6d0fe6eb
19/04/20 13:15:13 INFO mapred.MapTask: Processing split: hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input/file2:0+34
19/04/20 13:15:13 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/04/20 13:15:13 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/04/20 13:15:13 INFO mapred.MapTask: soft limit at 83886080
19/04/20 13:15:13 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/04/20 13:15:13 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/04/20 13:15:13 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/04/20 13:15:14 INFO mapred.LocalJobRunner: 
19/04/20 13:15:14 INFO mapred.MapTask: Starting flush of map output
19/04/20 13:15:14 INFO mapred.MapTask: Spilling map output
19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufend = 62; bufvoid = 104857600
19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
19/04/20 13:15:14 INFO mapred.MapTask: Finished spill 0
19/04/20 13:15:14 INFO mapred.Task: Task:attempt_local496876089_0001_m_000000_0 is done. And is in the process of committing
19/04/20 13:15:14 INFO mapred.LocalJobRunner: map
19/04/20 13:15:14 INFO mapred.Task: Task 'attempt_local496876089_0001_m_000000_0' done.
19/04/20 13:15:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_m_000000_0
19/04/20 13:15:14 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_m_000001_0
19/04/20 13:15:14 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
19/04/20 13:15:14 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@49728985
19/04/20 13:15:14 INFO mapred.MapTask: Processing split: hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input/file1:0+30
19/04/20 13:15:14 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/04/20 13:15:14 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/04/20 13:15:14 INFO mapred.MapTask: soft limit at 83886080
19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/04/20 13:15:14 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/04/20 13:15:14 INFO mapred.LocalJobRunner: 
19/04/20 13:15:14 INFO mapred.MapTask: Starting flush of map output
19/04/20 13:15:14 INFO mapred.MapTask: Spilling map output
19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufend = 58; bufvoid = 104857600
19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
19/04/20 13:15:14 INFO mapred.MapTask: Finished spill 0
19/04/20 13:15:14 INFO mapred.Task: Task:attempt_local496876089_0001_m_000001_0 is done. And is in the process of committing
19/04/20 13:15:14 INFO mapred.LocalJobRunner: map
19/04/20 13:15:14 INFO mapred.Task: Task 'attempt_local496876089_0001_m_000001_0' done.
19/04/20 13:15:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_m_000001_0
19/04/20 13:15:14 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_m_000002_0
19/04/20 13:15:14 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
19/04/20 13:15:14 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@66abe49f
19/04/20 13:15:14 INFO mapred.MapTask: Processing split: hdfs://192.168.223.150:8020/user/hadoop1/wordcount/input/file0:0+22
19/04/20 13:15:14 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/04/20 13:15:14 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/04/20 13:15:14 INFO mapred.MapTask: soft limit at 83886080
19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/04/20 13:15:14 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/04/20 13:15:14 INFO mapred.LocalJobRunner: 
19/04/20 13:15:14 INFO mapred.MapTask: Starting flush of map output
19/04/20 13:15:14 INFO mapred.MapTask: Spilling map output
19/04/20 13:15:14 INFO mapred.MapTask: bufstart = 0; bufend = 38; bufvoid = 104857600
19/04/20 13:15:14 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
19/04/20 13:15:14 INFO mapreduce.Job: Job job_local496876089_0001 running in uber mode : false
19/04/20 13:15:14 INFO mapreduce.Job:  map 67% reduce 0%
19/04/20 13:15:14 INFO mapred.MapTask: Finished spill 0
19/04/20 13:15:14 INFO mapred.Task: Task:attempt_local496876089_0001_m_000002_0 is done. And is in the process of committing
19/04/20 13:15:14 INFO mapred.LocalJobRunner: map
19/04/20 13:15:14 INFO mapred.Task: Task 'attempt_local496876089_0001_m_000002_0' done.
19/04/20 13:15:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_m_000002_0
19/04/20 13:15:14 INFO mapred.LocalJobRunner: map task executor complete.
19/04/20 13:15:14 INFO mapred.LocalJobRunner: Waiting for reduce tasks
19/04/20 13:15:14 INFO mapred.LocalJobRunner: Starting task: attempt_local496876089_0001_r_000000_0
19/04/20 13:15:14 INFO util.ProcfsBasedProcessTree: ProcfsBasedProcessTree currently is supported only on Linux.
19/04/20 13:15:14 INFO mapred.Task:  Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@51a03c64
19/04/20 13:15:14 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@3f9676f4
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
19/04/20 13:15:14 INFO reduce.EventFetcher: attempt_local496876089_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
19/04/20 13:15:14 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local496876089_0001_m_000002_0 decomp: 48 len: 52 to MEMORY
19/04/20 13:15:14 INFO reduce.InMemoryMapOutput: Read 48 bytes from map-output for attempt_local496876089_0001_m_000002_0
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 48, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->48
19/04/20 13:15:14 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local496876089_0001_m_000000_0 decomp: 78 len: 82 to MEMORY
19/04/20 13:15:14 INFO reduce.InMemoryMapOutput: Read 78 bytes from map-output for attempt_local496876089_0001_m_000000_0
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 78, inMemoryMapOutputs.size() -> 2, commitMemory -> 48, usedMemory ->126
19/04/20 13:15:14 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local496876089_0001_m_000001_0 decomp: 74 len: 78 to MEMORY
19/04/20 13:15:14 INFO reduce.InMemoryMapOutput: Read 74 bytes from map-output for attempt_local496876089_0001_m_000001_0
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 74, inMemoryMapOutputs.size() -> 3, commitMemory -> 126, usedMemory ->200
19/04/20 13:15:14 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
19/04/20 13:15:14 INFO mapred.LocalJobRunner: 3 / 3 copied.
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs
19/04/20 13:15:14 INFO mapred.Merger: Merging 3 sorted segments
19/04/20 13:15:14 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 173 bytes
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: Merged 3 segments, 200 bytes to disk to satisfy reduce memory limit
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: Merging 1 files, 200 bytes from disk
19/04/20 13:15:14 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
19/04/20 13:15:14 INFO mapred.Merger: Merging 1 sorted segments
19/04/20 13:15:14 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 187 bytes
19/04/20 13:15:14 INFO mapred.LocalJobRunner: 3 / 3 copied.
19/04/20 13:15:14 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
19/04/20 13:15:15 INFO mapred.Task: Task:attempt_local496876089_0001_r_000000_0 is done. And is in the process of committing
19/04/20 13:15:15 INFO mapred.LocalJobRunner: 3 / 3 copied.
19/04/20 13:15:15 INFO mapred.Task: Task attempt_local496876089_0001_r_000000_0 is allowed to commit now
19/04/20 13:15:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local496876089_0001_r_000000_0' to hdfs://192.168.223.150:8020/user/hadoop1/pq_out_d05c6a75-3bbd-4f34-98ff-2f7b7a231de4/_temporary/0/task_local496876089_0001_r_000000
19/04/20 13:15:15 INFO mapred.LocalJobRunner: reduce > reduce
19/04/20 13:15:15 INFO mapred.Task: Task 'attempt_local496876089_0001_r_000000_0' done.
19/04/20 13:15:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local496876089_0001_r_000000_0
19/04/20 13:15:15 INFO mapred.LocalJobRunner: reduce task executor complete.
19/04/20 13:15:15 INFO mapreduce.Job:  map 100% reduce 100%
19/04/20 13:15:15 INFO mapreduce.Job: Job job_local496876089_0001 completed successfully
19/04/20 13:15:15 INFO mapreduce.Job: Counters: 38
	File System Counters
		FILE: Number of bytes read=4340
		FILE: Number of bytes written=1010726
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=270
		HDFS: Number of bytes written=429
		HDFS: Number of read operations=37
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=6
	Map-Reduce Framework
		Map input records=3
		Map output records=18
		Map output bytes=158
		Map output materialized bytes=212
		Input split bytes=381
		Combine input records=0
		Combine output records=0
		Reduce input groups=12
		Reduce shuffle bytes=212
		Reduce input records=18
		Reduce output records=12
		Spilled Records=36
		Shuffled Maps =3
		Failed Shuffles=0
		Merged Map outputs=3
		GC time elapsed (ms)=34
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=1267204096
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=86
	File Output Format Counters 
		Bytes Written=429
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.codec.CodecConfig: Compression set to false
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.codec.CodecConfig: Compression: UNCOMPRESSED
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet block size to 134217728
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet page size to 1048576
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Parquet dictionary page size to 1048576
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Dictionary is on
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Validation is off
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Writer version is: PARQUET_1_0
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.ParquetOutputFormat: Maximum row group padding size is 0 bytes
2019-4-20 13:15:14 信息: org.apache.parquet.hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 200
2019-4-20 13:15:15 信息: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 131B for [name] BINARY: 12 values, 92B raw, 92B comp, 1 pages, encodings: [BIT_PACKED, PLAIN]
2019-4-20 13:15:15 信息: org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 39B for [age] INT32: 12 values, 6B raw, 6B comp, 1 pages, encodings: [BIT_PACKED, PLAIN_DICTIONARY], dic { 3 entries, 12B raw, 3B comp}
2019-4-20 13:15:15 信息: org.apache.parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5

[hdfs@hadoop1 ~]$ hadoop fs -ls /user/hadoop1/pq_out*
Found 4 items
-rw-r--r-- 3 hdfs supergroup 0 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/_SUCCESS
-rw-r--r-- 3 hdfs supergroup 129 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/_common_metadata
-rw-r--r-- 3 hdfs supergroup 278 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/_metadata
-rw-r--r-- 3 hdfs supergroup 429 2019-04-20 12:32 /user/hadoop1/pq_out_27c7ac84-ba26-43a6-8bcf-1d4f656a3d22/part-r-00000.parquet

FAQ

一、使用hadoop eclipse-plugin刪除hdfs文件報錯，錯誤信息相似：

Unable to delete file

....

org.apache.hadoop.security.AccessControlException: Permission denied: user =test , access=WRITE, inode="pokes":hadoop:supergroup:rwxr-xr-x

解決方法1：增長HADOOP_USER_NAME環境變量，指向有權限的用戶，例如hdfs。

解決方法2：爲用戶分配權限，例如hadoop fs -chmod 777 /user/xxx

網上還有一種解決方法：打開插件「Map/Reduce Location」，選中一個Location，打開「Advance parameters」 Tab，找到"hadoop.job.ugi"，能夠看到我這裏設置是：「test,Tardis」，修改成「hadoop, Tardis」，保存。可是我沒有找到這個參數。

重啓eclipse。

二、org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z、Failed to locate the winutils binary in the hadoop binary path Java.io.IOException: Could not locate executablenull\bin\winutils.exe in the Hadoop binaries相關問題

解決方法：下載winutils的windows版本（最好是hadoop.dll-and-winutils.exe-for-hadoop2.7.3-on-windows_X64-master的版本比windows本地hadoop版本高）、同時不要忘了將hadoop.dll拷貝到C:\Windows\System32目錄。