... String cacheFilePath = "/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000"; DistributedCache.addCacheFile(new Path(cacheFilePath).toUri(), job.getConfiguration()); ...
... // 從當前做業中獲取要緩存的文件 Path[] paths = DistributedCache.getLocalCacheFiles(context.getConfiguration()); for (Path path : paths) { if (path.toString().contains("cmc_unitparameter")) { ...
MR1 Path: hdfs://host:fs_port/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000 MR1 Path: hdfs://host:fs_port/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000 MR2 Path: /data4/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000006/part-m-00000 MR2 Path: /data17/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000002/part-m-00000 MR2 Path: /data23/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000005/part-m-00000看了上面兩種差別我想你能明白爲啥分佈式緩存在 MR2 下面「失效了」。。。
解決這個問題不難: html
其實在 MR1 時代咱們上面的代碼是不夠規範的,每次都遍歷了整個分佈式緩存,咱們應該用到一個小技巧:createSymlink java
... String cacheFilePath = "/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000"; Path inPath = new Path(cacheFilePath); // # 號以後的名稱是對上面文件的連接,不一樣文件的連接名不能相同,雖然由你本身隨便取 String inPathLink=inPath.toUri().toString()+"#"+"DIYFileName"; DistributedCache.addCacheFile(new URI(inPathLink), job.getConfiguration()); ...
加了軟連接後,path 信息的最後部分就是你剛纔的 DIYFileName: 緩存
/data4/yarn/local/usercache/root/appcache/application_1394073762364_1966/container_1394073762364_1966_01_000005/cmcs_paracontrolvalues /data4/yarn/local/usercache/root/appcache/application_1394073762364_1966/container_1394073762364_1966_01_000005/cmc_unitparameter
BufferedReader br = null; br = new BufferedReader(new InputStreamReader(new FileInputStream("DIYFileName")));
一、Hadoop 多表 join:map side join 範例 架構
http://my.oschina.net/leejun2005/blog/111963 app
二、Hadoop DistributedCache詳解 分佈式
http://dongxicheng.org/mapreduce-nextgen/hadoop-distributedcache-details/ ide
三、迭代式MapReduce解決方案(二) DistributedCache 函數
http://hongweiyi.com/2012/02/iterative-mapred-distcache/ oop
四、DistributedCache小記 spa