例子描述:app
大概意思爲,統計用戶使用app的次數排名spa
原始數據:scala
000041b232,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15097003,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11
000041b232,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:69:C0,15026002,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11
000041b232,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11
000041b744,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,719,4174,6=2016-06-23 08:50:00|7=,2016/7/5 11:11
000041b22f,李四,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15097002,,2016/6/8 17:10,2016/6/8 17:10,856,367,7=,2016/7/5 11:11
000041b1bc,李四,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,937,2964,3=北京|4=上海,2016/7/5 11:11
000041cf18,趙六,7C:1D:D9:F4:BE:E0,F8:E7:1E:1E:62:20,15097002,,2016/6/8 17:10,2016/6/8 17:10,665,2669,5=2016-06-22 00:00:00,2016/7/5 11:11
000041b1bc,孫七,7C:1D:D9:F4:BE:E0,38:FF:36:2E:5B:A0,9003000,,2016/6/8 17:10,2016/6/8 17:10,530,245,,2016/7/5 11:11
000041b8f1,王五,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,9007000,,2016/6/8 17:11,2016/6/8 17:11,626,6886,,2016/7/5 11:11
000041b8f1,周八,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16500000,,2016/6/8 17:11,2016/6/8 17:11,2532,646,,2016/7/5 11:11
000041966a,李四,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16501000,,2016/6/8 17:11,2016/6/8 17:11,690,454,,2016/7/5 11:11
000041966a,李四,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16501000,,2016/6/8 17:11,2016/6/8 17:11,690,454,,2016/7/5 11:11code
結果數據:排序
周八,人人貸:1
孫七,支付寶:1
趙六,途牛機票:1
王五,快錢:1|天弘基金:1
李四,紅嶺創投:2|攜程機票:1|攜程酒店:1|途牛機票:1
張三,途牛酒店:5|攜程機票:3支付寶
代碼片斷:get
cxRDD0.map { lines => val line = lines.split(",")//逗號分隔數據 //想辦法將數據拼成(數據,1)的映射,而且這個地方的數據要相同,能夠理解取爲用戶,APPID,而後當成K,寫個數字1當成V,這裏使用的字典關聯去取的數據 (s"""${line((data_location.getOrElse("USR_NBR", "").toInt))},${buss_location.getOrElse(line((data_location.getOrElse("BUS_ID", "").toInt)), "").split(",", -1)(0)}""", 1) }.reduceByKey(_ + _).map {//分組 lines => //將分組後的數據,以用戶爲K,其餘爲V拼成映射,便於後續分組 (s"${lines._1.split(",")(0)}", s"${lines._1.split(",")(1)},${lines._2}") }.groupByKey().map {//分組 case (k, v) => //對APPID數量 V 進行排序 val app = v.map { x => val a = x.split(",") //拆分APPID 與 數量,這裏傳遞給下面的類型爲映射 (a(0), a(1)) //使用sortWith對映射的第二位數字進行排序,須要轉換成INT,由於傳遞過來都是字符 }.toSeq.sortWith(_._2.toInt > _._2.toInt).map { app => //格式化輸出 //V:V s"${app._1}:${app._2}" } //格式化輸出 //K,V //K,V1|V2...... s"$k,${app.mkString("|")}" }.foreach(println)