博主上一篇文章介紹了使用hadoop來創建倒排索引的算法實現,本篇文章咱們繼續來看看QQ、粉絲共同好友如何使用hadoop來實現。java
1、背景算法
數據庫中有不少個QQ、且這些QQ的好友都可以查詢到;結果能夠規整以下: 數據庫
#結構---人:好友1,好友2,好友3,好友4.... A:B,C,D,F,E,O B:C,E,G,F,O,D D:Q,W,B,P,T,Y Y:S,Q,L,V,B,H,J,K,L O:L,E,Q,R,U,S,B P:O,L,E,L,F,Q,W,G K:S,L,D,U,R,E,A,X .....
咱們的需求是要獲得兩我的之間的共同好友:apache
A-B,C F O E A-D,B A-Y,B D-Y,Q B ......
2、代碼實現:centos
第一步,首先實現結構{友 人,人,人}:A I,K,C,B,G,F,H,O,D, 服務器
package com.empire.hadoop.mr.fensi; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class SharedFriendsStepOne { static class SharedFriendsStepOneMapper extends Mapper<LongWritable, Text, Text, Text> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // A:B,C,D,F,E,O String line = value.toString(); String[] person_friends = line.split(":"); String person = person_friends[0]; String friends = person_friends[1]; for (String friend : friends.split(",")) { // 輸出<好友,人> context.write(new Text(friend), new Text(person)); } } } static class SharedFriendsStepOneReducer extends Reducer<Text, Text, Text, Text> { @Override protected void reduce(Text friend, Iterable<Text> persons, Context context) throws IOException, InterruptedException { StringBuffer sb = new StringBuffer(); for (Text person : persons) { sb.append(person).append(","); } context.write(friend, new Text(sb.toString())); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(SharedFriendsStepOne.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(SharedFriendsStepOneMapper.class); job.setReducerClass(SharedFriendsStepOneReducer.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
第二步,實現共同好友{人-人,友 友 友 友}:A-B,C F O Eapp
package com.empire.hadoop.mr.fensi; import java.io.IOException; import java.util.Arrays; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class SharedFriendsStepTwo { static class SharedFriendsStepTwoMapper extends Mapper<LongWritable, Text, Text, Text> { // 拿到的數據是上一個步驟的輸出結果 // A I,K,C,B,G,F,H,O,D, // 友 人,人,人 @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String[] friend_persons = line.split("\t"); String friend = friend_persons[0]; String[] persons = friend_persons[1].split(","); Arrays.sort(persons); for (int i = 0; i < persons.length - 1; i++) { for (int j = i + 1; j < persons.length; j++) { // 發出 <人-人,好友> ,這樣,相同的「人-人」對的全部好友就會到同1個reduce中去 context.write(new Text(persons[i] + "-" + persons[j]), new Text(friend)); } } } } static class SharedFriendsStepTwoReducer extends Reducer<Text, Text, Text, Text> { @Override protected void reduce(Text person_person, Iterable<Text> friends, Context context) throws IOException, InterruptedException { StringBuffer sb = new StringBuffer(); for (Text friend : friends) { sb.append(friend).append(" "); } context.write(person_person, new Text(sb.toString())); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf); job.setJarByClass(SharedFriendsStepTwo.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapperClass(SharedFriendsStepTwoMapper.class); job.setReducerClass(SharedFriendsStepTwoReducer.class); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }
3、執行程序ide
#上傳jar Alt+p lcd d:/ put SharedStepOne.jar SharedStepTwo.jar put shared.txt #準備hadoop處理的數據文件 cd /home/hadoop hadoop fs -mkdir -p /shared/sharedinput hdfs dfs -put shared.txt /shared/sharedinput #運行程序 hadoop jar SharedStepOne.jar com.empire.hadoop.mr.fensi.SharedFriendsStepOne /shared/sharedinput /shared/sharedsteponeoutput hadoop jar SharedStepTwo.jar com.empire.hadoop.mr.fensi.SharedFriendsStepTwo /shared/sharedsteponeoutput/part-r-00000 /shared/sharedsteptwooutput
4、運行效果oop
[hadoop@centos-aaron-h1 ~]$ hadoop jar SharedStepOne.jar com.empire.hadoop.mr.fensi.SharedFriendsStepOne /shared/sharedinput /shared/sharedsteponeoutput 18/12/23 05:08:08 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 18/12/23 05:08:09 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 18/12/23 05:08:09 INFO input.FileInputFormat: Total input files to process : 1 18/12/23 05:08:10 INFO mapreduce.JobSubmitter: number of splits:1 18/12/23 05:08:10 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 18/12/23 05:08:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1545512861141_0001 18/12/23 05:08:11 INFO impl.YarnClientImpl: Submitted application application_1545512861141_0001 18/12/23 05:08:11 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1545512861141_0001/ 18/12/23 05:08:11 INFO mapreduce.Job: Running job: job_1545512861141_0001 18/12/23 05:08:20 INFO mapreduce.Job: Job job_1545512861141_0001 running in uber mode : false 18/12/23 05:08:20 INFO mapreduce.Job: map 0% reduce 0% 18/12/23 05:08:27 INFO mapreduce.Job: map 100% reduce 0% 18/12/23 05:08:33 INFO mapreduce.Job: map 100% reduce 100% 18/12/23 05:08:33 INFO mapreduce.Job: Job job_1545512861141_0001 completed successfully 18/12/23 05:08:33 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=306 FILE: Number of bytes written=394989 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=241 HDFS: Number of bytes written=166 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=3967 Total time spent by all reduces in occupied slots (ms)=3151 Total time spent by all map tasks (ms)=3967 Total time spent by all reduce tasks (ms)=3151 Total vcore-milliseconds taken by all map tasks=3967 Total vcore-milliseconds taken by all reduce tasks=3151 Total megabyte-milliseconds taken by all map tasks=4062208 Total megabyte-milliseconds taken by all reduce tasks=3226624 Map-Reduce Framework Map input records=7 Map output records=50 Map output bytes=200 Map output materialized bytes=306 Input split bytes=122 Combine input records=0 Combine output records=0 Reduce input groups=22 Reduce shuffle bytes=306 Reduce input records=50 Reduce output records=22 Spilled Records=100 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=190 CPU time spent (ms)=1260 Physical memory (bytes) snapshot=339103744 Virtual memory (bytes) snapshot=1694265344 Total committed heap usage (bytes)=137867264 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=119 File Output Format Counters Bytes Written=166 [hadoop@centos-aaron-h1 ~]$
[hadoop@centos-aaron-h1 ~]$ hadoop jar SharedStepTwo.jar com.empire.hadoop.mr.fensi.SharedFriendsStepTwo /shared/sharedsteponeoutput/part-r-00000 /shared/sharedsteptwooutput 18/12/23 05:12:19 INFO client.RMProxy: Connecting to ResourceManager at centos-aaron-h1/192.168.29.144:8032 18/12/23 05:12:20 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 18/12/23 05:12:20 INFO input.FileInputFormat: Total input files to process : 1 18/12/23 05:12:20 INFO mapreduce.JobSubmitter: number of splits:1 18/12/23 05:12:20 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 18/12/23 05:12:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1545512861141_0002 18/12/23 05:12:21 INFO impl.YarnClientImpl: Submitted application application_1545512861141_0002 18/12/23 05:12:21 INFO mapreduce.Job: The url to track the job: http://centos-aaron-h1:8088/proxy/application_1545512861141_0002/ 18/12/23 05:12:21 INFO mapreduce.Job: Running job: job_1545512861141_0002 18/12/23 05:12:29 INFO mapreduce.Job: Job job_1545512861141_0002 running in uber mode : false 18/12/23 05:12:29 INFO mapreduce.Job: map 0% reduce 0% 18/12/23 05:12:38 INFO mapreduce.Job: map 100% reduce 0% 18/12/23 05:12:44 INFO mapreduce.Job: map 100% reduce 100% 18/12/23 05:12:44 INFO mapreduce.Job: Job job_1545512861141_0002 completed successfully 18/12/23 05:12:44 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=438 FILE: Number of bytes written=395295 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=298 HDFS: Number of bytes written=208 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=5637 Total time spent by all reduces in occupied slots (ms)=3126 Total time spent by all map tasks (ms)=5637 Total time spent by all reduce tasks (ms)=3126 Total vcore-milliseconds taken by all map tasks=5637 Total vcore-milliseconds taken by all reduce tasks=3126 Total megabyte-milliseconds taken by all map tasks=5772288 Total megabyte-milliseconds taken by all reduce tasks=3201024 Map-Reduce Framework Map input records=22 Map output records=54 Map output bytes=324 Map output materialized bytes=438 Input split bytes=132 Combine input records=0 Combine output records=0 Reduce input groups=20 Reduce shuffle bytes=438 Reduce input records=54 Reduce output records=20 Spilled Records=108 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=251 CPU time spent (ms)=1260 Physical memory (bytes) snapshot=338214912 Virtual memory (bytes) snapshot=1694265344 Total committed heap usage (bytes)=137662464 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=166 File Output Format Counters Bytes Written=208 [hadoop@centos-aaron-h1 ~]$
5、運行結果大數據
[hadoop@centos-aaron-h1 ~]$ hdfs dfs -cat /shared/sharedsteponeoutput/part-r-00000 A K, B A,O,Y,D, C A,B, D B,A,K, E O,K,A,P,B, F P,A,B, G B,P, H Y, J Y, K Y, L K,Y,P,P,Y,O, O A,B,P, P D, Q P,D,Y,O, R K,O, S O,Y,K, T D, U O,K, V Y, W D,P, X K, Y D, [hadoop@centos-aaron-h1 ~]$ hdfs dfs -cat /shared/sharedsteptwooutput/part-r-00000 A-B F C D O E A-D B A-K D E A-O B E A-P E F O A-Y B B-K E D B-O E B-P F G O E D-O B Q D-P W Q D-Y B Q K-O E U L S R K-P E L L K-Y L L S O-P L L Q E O-Y B S Q L L P-P L P-Y L L Q L L Y-Y L [hadoop@centos-aaron-h1 ~]$
最後寄語,以上是博主本次文章的所有內容,若是你們以爲博主的文章還不錯,請點贊;若是您對博主其它服務器大數據技術或者博主本人感興趣,請關注博主博客,而且歡迎隨時跟博主溝通交流。