tachyon與hdfs,以及spark整合

Tachyon 0.7.1僞分佈式集羣安裝與測試:
http://blog.csdn.net/stark_summer/article/details/48321605
從官方文檔得知,Spark 1.4.x和Tachyon 0.6.4版本兼容,而最新版的Tachyon 0.7.1和Spark 1.5.x兼容,目前所用的Spark爲1.4.1,tachyon爲 0.7.1css

tachyon 與 hdfs整合

修改tachyon-env.shjava

export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020
Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data

上傳文件到hdfsapache

hadoop fs -put /home/cluster/data/test/bank/ /data/spark/

 hadoop fs -ls /data/spark/bank/
Found 3 items
-rw-r--r--   3 wangyue supergroup    4610348 2015-09-11 20:02 /data/spark/bank/bank-full.csv
-rw-r--r--   3 wangyue supergroup       3864 2015-09-11 20:02 /data/spark/bank/bank-names.txt
-rw-r--r--   3 wangyue supergroup     461474 2015-09-11 20:02 /data/spark/bank/bank.csv

經過tachyon 讀取/data/spark/bank/bank-full.csv文件markdown

val bankFullFile = sc.textFile("tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv")
2015-09-11 20:08:20,136 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(177384) called with curMem=630803, maxMem=257918238
2015-09-11 20:08:20,137 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_3 stored as values in memory (estimated size 173.2 KB, free 245.2 MB)
2015-09-11 20:08:20,154 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(17665) called with curMem=808187, maxMem=257918238
2015-09-11 20:08:20,155 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_3_piece0 stored as bytes in memory (estimated size 17.3 KB, free 245.2 MB)
2015-09-11 20:08:20,156 INFO  [sparkDriver-akka.actor.default-dispatcher-2] storage.BlockManagerInfo (Logging.scala:logInfo(59)) - Added broadcast_3_piece0 in memory on localhost:41040 (size: 17.3 KB, free: 245.9 MB)
2015-09-11 20:08:20,157 INFO  [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 3 from textFile at <console>:21
bankFullFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[7] at textFile at <console>:21

count分佈式

bankFullFile.count()
可是發現報錯以下:
2015-09-11 21:34:31,494 WARN  [Executor task launch worker-6]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,495 WARN  [Executor task launch worker-6]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,489 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,495 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing
2015-09-11 21:34:31,496 WARN  [Executor task launch worker-7]  (RemoteBlockInStream.java:retrieveByteBufferFromRemoteMachine(320)) - Read nothing

感受錯誤很詭異,有人知道這是什麼緣由?tell me why?oop

可是 我在tachyon 文件系統中能夠看到以下內容:測試

./bin/tachyon tfs ls /data/spark/bank/bank-full.csv/
4502.29 KB09-11-2015 20:09:02:078  Not In Memory  /data/spark/bank/bank-full.csv/bank-full.csv

而bank-full.csv在hdfs文件是ui

hadoop fs -ls /data/spark/bank/
Found 3 items
-rw-r--r--   3 wangyue supergroup    4610348 2015-09-11 20:02 /data/spark/bank/bank-full.csv
-rw-r--r--   3 wangyue supergroup       3864 2015-09-11 20:02 /data/spark/bank/bank-names.txt
-rw-r--r--   3 wangyue supergroup     461474 2015-09-11 20:02 /data/spark/bank/bank.csv

其實Tachyon自己將bank-full.csv文件加載到了內存,並存放到自身的文件系統裏面:tachyon://master:19998/data/spark/bank/bank-full.csv/bank-full.csv」
Tachyon的conf/tachyon-env.sh文件裏面配置的,經過export TACHYON_UNDERFS_ADDRESS=hdfs://master:8020配置,這樣tachyon://localhost:19998就能夠獲取hdfs文件指定路徑文件spa

好吧,那我就先經過hdfs方式讀取文件而後 保存到tachyon.net

scala> val bankfullfile =  sc.textFile("/data/spark/bank/bank-full.csv")
scala> bankfullfile.count
res0: Long = 45212

scala> bankfullfile.saveAsTextFile("tachyon://master:19998/data/spark/bank/newbankfullfile")

未完成,待續~

相關文章
相關標籤/搜索