啓動hadoop最小集羣的典型配置是3臺服務器, 一臺做爲Master, NameNode, 兩臺做爲Slave, DataNode. 操做系統使用的Ubuntu18.04 Server, 安裝過程就省略了, 使用的是LVM文件系統, XFS文件格式, 爲了不浪費空間, 除了劃分1G給/boot之外, 其餘都劃爲/html
服務器規劃java
192.168.1.148 vm148 -- 做爲master, NameNode, ResourceManager
192.168.1.149 vm149 -- 做爲slave, DataNode, NodeManager
192.168.1.150 vm150 -- 做爲slave, DataNode, NodeManagernode
注意: 這裏是第一個坑, 主機名裏面不能帶下劃線 _ , 會致使DataServer建立socket失敗沒法啓動.web
安裝後的升級apache
sudo apt update sudo apt upgrade
添加普通用戶tomcat
用於運行hadoop的受限用戶, 我習慣用tomcat做爲用戶名, 這裏使用adduser而不是useradd, 由於後者不帶參數時, 有時候不會建立home目錄bash
sudo adduser tomcat # 按提示輸入
.若是是虛機, 這時候就能夠以當前狀態建立模板了.服務器
設置hostname和hostsapp
# view current hostname sudo hostnamectl status # set sudo hostnamectl set-hostname vm148 # add entries to hosts sudo vi /etc/hosts # add following lines 192.168.1.148 vm148 192.168.1.149 vm149 192.168.1.150 vm150
.依次將服務器設置爲vm148, vm149, vm150. 重啓後登入檢查是否生效, 互相ping看看是否生效ssh
對tomcat用戶互相添加免密登陸
# 生成id_rsa和id_rsa.pub ssh-keygen cd .ssh/ # 建立 authorized_keys mv id_rsa.pub authorized_keys # 注意權限必須是600 chmod 600 authorized_keys # 將本服務器的私鑰更名爲id_rsa_mine mv id_rsa id_rsa_mine # 修改config vi config # 添加以下內容 Host vm149 IdentityFile ~/.ssh/id_rsa_mine User tomcat Host vm150 IdentityFile ~/.ssh/id_rsa_mine User tomcat Host vm148 IdentityFile ~/.ssh/id_rsa_mine User tomcat # 若是是master機器, 還須要添加以下, 用於啓動Secondary name server Host 0.0.0.0 IdentityFile ~/.ssh/id_rsa_mine User tomcat
將各個服務器的authorized_keys的內容互相合並, 最後各服務器的authorized_keys文件都是同樣的.
在以上工做完成後, 從各個機器嘗試ssh tomcat@[主機名], 確保登陸沒問題, 也避免在啓動服務時提示發現新key是否接受
防火牆ufw
若是是初次設置, 建議關閉, 確保不會由於防火牆而致使服務啓動失敗, 能夠等服務配置完成後, 再根據實際的端口, 打開並配置ufw
sudo ufw disable
將jdk解壓縮至/opt/jdk, 並建立latest軟鏈, 完成後結構以下
$ ll /opt/jdk/ total 0 drwxr-xr-x 7 root root 245 Oct 6 13:58 jdk1.8.0_192/ lrwxrwxrwx 1 root root 12 Jan 18 05:49 latest -> jdk1.8.0_192/
須要將jps軟鏈到/usr/bin
cd /usr/bin sudo ln -s /opt/jdk/latest/bin/jps jps
將hadoop解壓縮至 /opt/hadoop, 並建立latest 軟鏈, 完成後目錄結構以下
$ ll /opt/hadoop/ total 0 drwxr-xr-x 9 root root 149 Nov 13 15:15 hadoop-2.9.2/ lrwxrwxrwx 1 root root 12 Jan 18 10:26 latest -> hadoop-2.9.2/
修改配置文件 etc/hadoop/hadoop-env.sh
須要修改的變量有兩處
# The java implementation to use. export JAVA_HOME=/opt/jdk/latest # Where log files are stored. $HADOOP_HOME/logs by default. export HADOOP_LOG_DIR=/home/tomcat/run/hadoop/logs
修改配置文件 etc/hadoop/yarn-env.sh
須要修改的變量有兩處
# some Java parameters export JAVA_HOME=/opt/jdk/latest # default log directory & file export YARN_LOG_DIR=/home/tomcat/run/yarn/logs
修改配置文件/etc/hadoop/slaves
將內容修改成兩個slave的主機名
vm149 vm150
修改配置文件/etc/hadoop/core-site.xml
添加如下內容. 配置明細須要參考 share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/tomcat/run/hadoop</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://vm148:9000</value> </property> </configuration>
修改配置文件/etc/hadoop/hdfs-site.xml
添加如下內容
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
修改配置文件/etc/hadoop/mapred-site.xml
添加如下內容
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
修改配置文件/etc/hadoop/yarn-site.xml
添加如下內容. 配置明細須要參考 share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
<configuration> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>vm148</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
將配置好的hadoop, 按當前的目錄結構, 複製到另外兩個服務器中
第一次啓動前, 須要format nameserver, 在master服務器上執行
/opt/hadoop/latest/bin/hdfs namenode -format
而後啓動hdfs服務
/opt/hadoop/latest/sbin/start-dfs.sh
而後啓動yarn服務
/opt/hadoop/latest/sbin/start-yarn.sh
每一步, 都須要用jps命令查看服務是否正常啓動, 對於master服務器, 正常啓動後應該顯示以下進程
tomcat@vm148:/opt$ jps 3173 SecondaryNameNode 3495 ResourceManager 4583 Jps 2906 NameNode
slave服務器
tomcat@vm149:~/run$ jps 3074 NodeManager 2691 DataNode 3591 Jps
.
服務啓動後, 能夠經過 http://vm148:50070/ 訪問web界面
master端
21, FTP for ? 8030, YARN resourcemanager scheduler 8031, YARN resourcemanager tracker 8032, YARN resourcemanager 8033, YARN resourcemanager admin 8088, YARN resourcemanager webapp 8090, YARN resourcemanager webapp https 9000, HDFS 50070, WEB UI 50090,
slave, data node端
50075
首先編譯java, 生成class, 生成jar. 由於JAVA_HOME已經在hadoop裏配置過, 而PATH在此環境不須要, 只須要配置一個tools.jar的classpath就能夠了
export HADOOP_CLASSPATH=/opt/jdk/latest/lib/tools.jar /opt/hadoop/latest/bin/hadoop com.sun.tools.javac.Main WordCount.java /opt/jdk/latest/bin/jar cf wc.jar WordCount*.class
而後將兩個輸入文件上載到hdfs.
/opt/hadoop/latest/bin/hadoop fs -put file01 /workspace/input/ /opt/hadoop/latest/bin/hadoop fs -ls /workspace/input /opt/hadoop/latest/bin/hadoop fs -put file02 /workspace/input/ /opt/hadoop/latest/bin/hadoop fs -cat /workspace/input/file01 /opt/hadoop/latest/bin/hadoop fs -cat /workspace/input/file02
一開始我在這裏遇到了個坑: 我把文件放到/tmp/下面去了, 把/tmp做爲輸入目錄, 結果在運行中yarn會把staging信息存在 /tmp/hadoop-yarn/staging 文件中, 而後任務就拋異常了. 教訓就是: 任務文件不要放到/tmp下
執行任務
/opt/hadoop/latest/bin/hadoop jar wc.jar WordCount /workspace/input /workspace/output
這裏最後一個路徑是輸出路徑, 這個路徑在運行任務前不能存在, 不然也會報錯
最後的執行結果
tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop jar wc.jar WordCount /workspace/input /workspace/output 19/01/30 08:24:55 INFO client.RMProxy: Connecting to ResourceManager at vm148/192.168.1.148:8032 19/01/30 08:24:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 19/01/30 08:24:56 INFO input.FileInputFormat: Total input files to process : 2 19/01/30 08:24:56 INFO mapreduce.JobSubmitter: number of splits:2 19/01/30 08:24:56 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/01/30 08:24:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547812325179_0004 19/01/30 08:24:56 INFO impl.YarnClientImpl: Submitted application application_1547812325179_0004 19/01/30 08:24:56 INFO mapreduce.Job: The url to track the job: http://vm148:8088/proxy/application_1547812325179_0004/ 19/01/30 08:24:56 INFO mapreduce.Job: Running job: job_1547812325179_0004 19/01/30 08:25:03 INFO mapreduce.Job: Job job_1547812325179_0004 running in uber mode : false 19/01/30 08:25:03 INFO mapreduce.Job: map 0% reduce 0% 19/01/30 08:25:10 INFO mapreduce.Job: map 100% reduce 0% 19/01/30 08:25:18 INFO mapreduce.Job: map 100% reduce 100% 19/01/30 08:25:18 INFO mapreduce.Job: Job job_1547812325179_0004 completed successfully 19/01/30 08:25:18 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=97 FILE: Number of bytes written=594622 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=266 HDFS: Number of bytes written=38 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=10309 Total time spent by all reduces in occupied slots (ms)=3850 Total time spent by all map tasks (ms)=10309 Total time spent by all reduce tasks (ms)=3850 Total vcore-milliseconds taken by all map tasks=10309 Total vcore-milliseconds taken by all reduce tasks=3850 Total megabyte-milliseconds taken by all map tasks=10556416 Total megabyte-milliseconds taken by all reduce tasks=3942400 Map-Reduce Framework Map input records=2 Map output records=10 Map output bytes=96 Map output materialized bytes=103 Input split bytes=210 Combine input records=10 Combine output records=8 Reduce input groups=5 Reduce shuffle bytes=103 Reduce input records=8 Reduce output records=5 Spilled Records=16 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=379 CPU time spent (ms)=2090 Physical memory (bytes) snapshot=778280960 Virtual memory (bytes) snapshot=5914849280 Total committed heap usage (bytes)=507510784 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=56 File Output Format Counters Bytes Written=38 tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop fs -ls /workspace/output Found 2 items -rw-r--r-- 2 tomcat supergroup 0 2019-01-30 08:25 /workspace/output/_SUCCESS -rw-r--r-- 2 tomcat supergroup 38 2019-01-30 08:25 /workspace/output/part-r-00000 tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop fs -cat /workspace/output/part-r-00000 Day 2 Good 2 Hadoop 2 Hello 2 World 2
輸入的內容格式是這樣的, 每一行是一個日誌記錄, 記錄了用戶, IP和時間戳, 須要統計每一個 (用戶+IP) 出現的次數
1571 76 738 legnd 166.111.8.133 870876781 1572 121 697 kuoc 202.116.65.16 870909489 1573 121 697 kuoc 202.116.65.16 870910644 1574 121 739 maerick 870926284
代碼 pom.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.rockbb</groupId> <artifactId>hdtask</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>HD Task</name> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.8.2</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.4.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.4.1</version> </dependency> </dependencies> <build> <pluginManagement> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.3</version> <configuration> <source>1.8</source> <target>1.8</target> <encoding>UTF-8</encoding> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-resources-plugin</artifactId> <configuration> <encoding>UTF-8</encoding> </configuration> </plugin> </plugins> </pluginManagement> </build> </project>
代碼 DataBean.java
package com.rockbb.hdtask; import org.apache.hadoop.io.Writable; import java.io.DataInput; import java.io.DataOutput; import java.io.IOException; public class DataBean implements Writable { private String nameIp; private long count; public DataBean() { } public DataBean(String nameIp, long count) { this.nameIp = nameIp; this.count = count; } public String getNameIp() { return nameIp; } public void setNameIp(String nameIp) { this.nameIp = nameIp; } public long getCount() { return count; } public void setCount(long count) { this.count = count; } /** * Important: this will be use for the final output. */ @Override public String toString() { return this.nameIp + "\t" + this.count; } @Override public void write(DataOutput dataOutput) throws IOException { dataOutput.writeUTF(nameIp); dataOutput.writeLong(count); } @Override public void readFields(DataInput dataInput) throws IOException { this.nameIp = dataInput.readUTF(); this.count = dataInput.readLong(); } }
代碼 IpCount.java
package com.rockbb.hdtask; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; public class IpCount { public static class IpMapper extends Mapper<LongWritable, Text, Text, DataBean> { @Override public void map(LongWritable keyIn, Text valueIn, Context context) throws IOException, InterruptedException { String line = valueIn.toString(); String[] fields = line.split("\t"); String keyOut = fields[3] + '-' + fields[4]; long valueOut = 1; DataBean bean = new DataBean(keyOut, valueOut); context.write(new Text(keyOut), bean); } } public static class IpReducer extends Reducer<Text, DataBean, Text, DataBean> { @Override public void reduce(Text keyIn, Iterable<DataBean> valuesIn, Context context) throws IOException, InterruptedException { long total = 0; for (DataBean bean : valuesIn) { total += bean.getCount(); } DataBean bean = new DataBean("", total); context.write(keyIn, bean); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(IpCount.class); job.setMapperClass(IpMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DataBean.class); FileInputFormat.addInputPath(job, new Path(args[0])); job.setReducerClass(IpReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DataBean.class); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
運行命令
/opt/hadoop/latest/bin/hadoop jar hdtask.jar com.rockbb.hdtask.IpCount /workspace/input/ /workspace/output3
.數據文件有2.3GB, 由於默認的block大小爲128MB, 因此提交後產生了19個Map任務和一個Reduce任務, 任務的命令行輸出
19/01/31 10:08:01 INFO client.RMProxy: Connecting to ResourceManager at vm148/192.168.31.148:8032 19/01/31 10:08:02 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 19/01/31 10:08:02 INFO input.FileInputFormat: Total input files to process : 1 19/01/31 10:08:02 INFO mapreduce.JobSubmitter: number of splits:19 19/01/31 10:08:02 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 19/01/31 10:08:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547812325179_0008 19/01/31 10:08:03 INFO impl.YarnClientImpl: Submitted application application_1547812325179_0008 19/01/31 10:08:03 INFO mapreduce.Job: The url to track the job: http://vm148:8088/proxy/application_1547812325179_0008/ 19/01/31 10:08:03 INFO mapreduce.Job: Running job: job_1547812325179_0008 19/01/31 10:08:13 INFO mapreduce.Job: Job job_1547812325179_0008 running in uber mode : false 19/01/31 10:08:13 INFO mapreduce.Job: map 0% reduce 0% 19/01/31 10:08:41 INFO mapreduce.Job: map 11% reduce 0% 19/01/31 10:08:45 INFO mapreduce.Job: map 21% reduce 0% 19/01/31 10:08:47 INFO mapreduce.Job: map 23% reduce 0% 19/01/31 10:08:51 INFO mapreduce.Job: map 28% reduce 0% 19/01/31 10:08:53 INFO mapreduce.Job: map 30% reduce 0% 19/01/31 10:08:57 INFO mapreduce.Job: map 31% reduce 0% 19/01/31 10:08:59 INFO mapreduce.Job: map 38% reduce 0% 19/01/31 10:09:09 INFO mapreduce.Job: map 39% reduce 0% 19/01/31 10:09:10 INFO mapreduce.Job: map 40% reduce 0% 19/01/31 10:09:11 INFO mapreduce.Job: map 44% reduce 0% 19/01/31 10:09:14 INFO mapreduce.Job: map 46% reduce 0% 19/01/31 10:09:16 INFO mapreduce.Job: map 48% reduce 0% 19/01/31 10:09:17 INFO mapreduce.Job: map 49% reduce 0% 19/01/31 10:09:22 INFO mapreduce.Job: map 55% reduce 0% 19/01/31 10:09:24 INFO mapreduce.Job: map 56% reduce 0% 19/01/31 10:09:28 INFO mapreduce.Job: map 61% reduce 0% 19/01/31 10:09:40 INFO mapreduce.Job: map 64% reduce 0% 19/01/31 10:09:42 INFO mapreduce.Job: map 64% reduce 7% 19/01/31 10:09:46 INFO mapreduce.Job: map 66% reduce 7% 19/01/31 10:09:48 INFO mapreduce.Job: map 68% reduce 9% 19/01/31 10:09:52 INFO mapreduce.Job: map 71% reduce 9% 19/01/31 10:09:54 INFO mapreduce.Job: map 71% reduce 12% 19/01/31 10:09:58 INFO mapreduce.Job: map 73% reduce 12% 19/01/31 10:09:59 INFO mapreduce.Job: map 74% reduce 12% 19/01/31 10:10:01 INFO mapreduce.Job: map 75% reduce 12% 19/01/31 10:10:04 INFO mapreduce.Job: map 80% reduce 12% 19/01/31 10:10:06 INFO mapreduce.Job: map 81% reduce 12% 19/01/31 10:10:10 INFO mapreduce.Job: map 85% reduce 12% 19/01/31 10:10:12 INFO mapreduce.Job: map 86% reduce 12% 19/01/31 10:10:13 INFO mapreduce.Job: map 87% reduce 12% 19/01/31 10:10:15 INFO mapreduce.Job: map 88% reduce 12% 19/01/31 10:10:18 INFO mapreduce.Job: map 88% reduce 16% 19/01/31 10:10:22 INFO mapreduce.Job: map 90% reduce 16% 19/01/31 10:10:23 INFO mapreduce.Job: map 91% reduce 16% 19/01/31 10:10:24 INFO mapreduce.Job: map 91% reduce 18% 19/01/31 10:10:25 INFO mapreduce.Job: map 92% reduce 18% 19/01/31 10:10:29 INFO mapreduce.Job: map 93% reduce 18% 19/01/31 10:10:31 INFO mapreduce.Job: map 93% reduce 21% 19/01/31 10:10:32 INFO mapreduce.Job: map 94% reduce 21% 19/01/31 10:10:34 INFO mapreduce.Job: map 96% reduce 21% 19/01/31 10:10:35 INFO mapreduce.Job: map 97% reduce 21% 19/01/31 10:10:37 INFO mapreduce.Job: map 98% reduce 23% 19/01/31 10:10:38 INFO mapreduce.Job: map 99% reduce 23% 19/01/31 10:10:41 INFO mapreduce.Job: map 100% reduce 23% 19/01/31 10:10:43 INFO mapreduce.Job: map 100% reduce 30% 19/01/31 10:10:49 INFO mapreduce.Job: map 100% reduce 33% 19/01/31 10:11:25 INFO mapreduce.Job: map 100% reduce 67% 19/01/31 10:11:31 INFO mapreduce.Job: map 100% reduce 70% 19/01/31 10:11:37 INFO mapreduce.Job: map 100% reduce 74% 19/01/31 10:11:43 INFO mapreduce.Job: map 100% reduce 78% 19/01/31 10:11:49 INFO mapreduce.Job: map 100% reduce 83% 19/01/31 10:11:55 INFO mapreduce.Job: map 100% reduce 86% 19/01/31 10:12:01 INFO mapreduce.Job: map 100% reduce 89% 19/01/31 10:12:07 INFO mapreduce.Job: map 100% reduce 93% 19/01/31 10:12:13 INFO mapreduce.Job: map 100% reduce 97% 19/01/31 10:12:18 INFO mapreduce.Job: map 100% reduce 100% 19/01/31 10:12:19 INFO mapreduce.Job: Job job_1547812325179_0008 completed successfully 19/01/31 10:12:19 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=6635434217 FILE: Number of bytes written=9269615741 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2551940756 HDFS: Number of bytes written=134288980 HDFS: Number of read operations=60 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Killed map tasks=3 Launched map tasks=22 Launched reduce tasks=1 Data-local map tasks=22 Total time spent by all maps in occupied slots (ms)=1737403 Total time spent by all reduces in occupied slots (ms)=178563 Total time spent by all map tasks (ms)=1737403 Total time spent by all reduce tasks (ms)=178563 Total vcore-milliseconds taken by all map tasks=1737403 Total vcore-milliseconds taken by all reduce tasks=178563 Total megabyte-milliseconds taken by all map tasks=1779100672 Total megabyte-milliseconds taken by all reduce tasks=182848512 Map-Reduce Framework Map input records=49458230 Map output records=49458230 Map output bytes=2531297616 Map output materialized bytes=2630214190 Input split bytes=2052 Combine input records=0 Combine output records=0 Reduce input groups=5453085 Reduce shuffle bytes=2630214190 Reduce input records=49458230 Reduce output records=5453085 Spilled Records=174185483 Shuffled Maps =19 Failed Shuffles=0 Merged Map outputs=19 GC time elapsed (ms)=9585 CPU time spent (ms)=389790 Physical memory (bytes) snapshot=5763260416 Virtual memory (bytes) snapshot=39333715968 Total committed heap usage (bytes)=4077912064 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2551938704 File Output Format Counters Bytes Written=134288980