spark 例子groupByKey分組計算2

spark 例子groupByKey分組計算2


例子描述:app

大概意思爲,統計用戶使用app的次數排名spa

原始數據:scala

000041b232,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15097003,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11
000041b232,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:69:C0,15026002,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11
000041b232,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,690,6218,11=0|12=200,2016/7/5 11:11
000041b744,張三,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,719,4174,6=2016-06-23 08:50:00|7=,2016/7/5 11:11
000041b22f,李四,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15097002,,2016/6/8 17:10,2016/6/8 17:10,856,367,7=,2016/7/5 11:11
000041b1bc,李四,FC:1A:11:5C:58:34,F8:E7:1E:1E:62:20,15026002,,2016/6/8 17:10,2016/6/8 17:10,937,2964,3=北京|4=上海,2016/7/5 11:11
000041cf18,趙六,7C:1D:D9:F4:BE:E0,F8:E7:1E:1E:62:20,15097002,,2016/6/8 17:10,2016/6/8 17:10,665,2669,5=2016-06-22 00:00:00,2016/7/5 11:11
000041b1bc,孫七,7C:1D:D9:F4:BE:E0,38:FF:36:2E:5B:A0,9003000,,2016/6/8 17:10,2016/6/8 17:10,530,245,,2016/7/5 11:11
000041b8f1,王五,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,9007000,,2016/6/8 17:11,2016/6/8 17:11,626,6886,,2016/7/5 11:11
000041b8f1,周八,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16500000,,2016/6/8 17:11,2016/6/8 17:11,2532,646,,2016/7/5 11:11
000041966a,李四,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16501000,,2016/6/8 17:11,2016/6/8 17:11,690,454,,2016/7/5 11:11
000041966a,李四,FC:1A:11:5C:58:34,38:FF:36:2E:5B:A0,16501000,,2016/6/8 17:11,2016/6/8 17:11,690,454,,2016/7/5 11:11code

結果數據:排序

周八,人人貸:1
孫七,支付寶:1
趙六,途牛機票:1
王五,快錢:1|天弘基金:1
李四,紅嶺創投:2|攜程機票:1|攜程酒店:1|途牛機票:1
張三,途牛酒店:5|攜程機票:3支付寶


代碼片斷:get

cxRDD0.map {
  lines =>
    val line = lines.split(",")//逗號分隔數據
    //想辦法將數據拼成(數據,1)的映射,而且這個地方的數據要相同,能夠理解取爲用戶,APPID,而後當成K,寫個數字1當成V,這裏使用的字典關聯去取的數據
    (s"""${line((data_location.getOrElse("USR_NBR", "").toInt))},${buss_location.getOrElse(line((data_location.getOrElse("BUS_ID", "").toInt)), "").split(",", -1)(0)}""", 1)
}.reduceByKey(_ + _).map {//分組
  lines =>
    //將分組後的數據,以用戶爲K,其餘爲V拼成映射,便於後續分組
    (s"${lines._1.split(",")(0)}", s"${lines._1.split(",")(1)},${lines._2}")
}.groupByKey().map {//分組
  case (k, v) =>
    //對APPID數量 V 進行排序
    val app = v.map {
      x =>
        val a = x.split(",")
        //拆分APPID 與 數量,這裏傳遞給下面的類型爲映射
        (a(0), a(1))
        //使用sortWith對映射的第二位數字進行排序,須要轉換成INT,由於傳遞過來都是字符
    }.toSeq.sortWith(_._2.toInt > _._2.toInt).map {
      app =>
        //格式化輸出
        //V:V
        s"${app._1}:${app._2}"
    }
    //格式化輸出
    //K,V
    //K,V1|V2......
    s"$k,${app.mkString("|")}"
}.foreach(println)
相關文章
相關標籤/搜索