Alluxio 1.8.1html
官方:http://www.alluxio.org/node
Open Source Memory Speed Virtual Distributed Storage
Alluxio, formerly Tachyon, enables any application to interact with any data from any storage system at memory speed.shell
alluxio是一個開源的擁有內存訪問速度的虛擬分佈式存儲;以前叫Tachyon,可使應用像訪問內存數據同樣訪問任何存儲系統中的數據。apache
Alluxio unifies data access to different systems, and seamlessly bridges computation frameworks and underlying storage.json
Decouple compute and storage without any loss in performance.服務器
將計算和存儲分離,而且不會損失性能;app
Alluxio can be divided into three components: masters, workers, and clients. A typical setup consists of a single leading master, multiple standby masters, and multiple workers. The master and worker processes constitute the Alluxio servers, which are the components a system administrator would maintain. The clients are used to communicate with the Alluxio servers by applications such as Spark or MapReduce jobs, Alluxio command-line, or the FUSE layer.less
alluxio由master、worker組成,其中master若是有多個,只有一個是leading master,其餘爲standby master;ssh
The Alluxio master service can be deployed as one leading master and several standby masters for fault tolerance. When the leading master goes down, a standby master is elected to become the new leading master.curl
Only one master process can be the leading master in an Alluxio cluster. The leading master is responsible for managing the global metadata of the system. This includes file system metadata (e.g. the file system inode tree), block metadata (e.g. block locations), and worker capacity metadata (free and used space). Alluxio clients interact with the leading master to read or modify this metadata. All workers periodically send heartbeat information to the leading master to maintain their participation in the cluster. The leading master does not initiate communication with other components; it only responds to requests via RPC services. The leading master records all file system transactions to a distributed persistent storage to allow for recovery of master state information; the set of records is referred to as the journal.
alluxio中只有一個leading master,leading master負責管理全部的元數據,包括文件系統元數據、block元數據和worker元數據;worker會按期向leading master發送心跳;leading master會記錄全部的文件操做到日誌中;
Standby masters read journals written by the leading master to keep their own copies of the master state up-to-date. They also write journal checkpoints for faster recovery in the future. They do not process any requests from other Alluxio components.
standby master會及時同步讀取leader master的日誌;
Alluxio workers are responsible for managing user-configurable local resources allocated to Alluxio (e.g. memory, SSDs, HDDs). Alluxio workers store data as blocks and serve client requests that read or write data by reading or creating new blocks within their local resources. Workers are only responsible for managing blocks; the actual mapping from files to blocks is only stored by the master.
worker負責管理資源,好比內存、ssd等;worker負責將數據存儲爲block同時響應client的讀寫請求;實際的file和block的映射關係保存在master中;
Because RAM usually offers limited capacity, blocks in a worker can be evicted when space is full. Workers employ eviction policies to decide which data to keep in the Alluxio space.
The Alluxio client provides users a gateway to interact with the Alluxio servers. It initiates communication with the leading master to carry out metadata operations and with workers to read and write data that is stored in Alluxio.
client先向leading master請求元數據信息,而後向worker發送讀寫請求;
$ wget http://downloads.alluxio.org/downloads/files//1.8.1/alluxio-1.8.1-hadoop-2.6-bin.tar.gz
$ tar xvf alluxio-1.8.1-hadoop-2.6-bin.tar.gz
$ cd alluxio-1.8.1-hadoop-2.6
便可以 ssh localhost
詳見:http://www.javashuo.com/article/p-rndbchju-bd.html
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
$ vi conf/alluxio-site.properties
alluxio.master.hostname=localhost
$ ./bin/alluxio validateEnv local
$ ./bin/alluxio format
$ ./bin/alluxio-start.sh local SudoMount
若是報錯:
Formatting RamFS: /mnt/ramdisk (44849277610)
ERROR: mkdir /mnt/ramdisk failed
須要添加sudo權限
# visudo -f /etc/sudoers
$user ALL=(ALL) NOPASSWD: /bin/mount * /mnt/ramdisk, /bin/umount * /mnt/ramdisk, /bin/mkdir * /mnt/ramdisk, /bin/chmod * /mnt/ramdisk
$ ./bin/alluxio fs
$ ./bin/alluxio fs ls /
$ ./bin/alluxio fs copyFromLocal LICENSE /LICENSE
$ ./bin/alluxio fs cat /LICENSE
看起來和hdfs命令很像
$ bin/alluxio fsadmin report
Alluxio cluster summary:
Master Address: localhost/127.0.0.1:19998
Web Port: 19999
Rpc Port: 19998
Started: 01-24-2019 10:28:59:433
Uptime: 0 day(s), 1 hour(s), 24 minute(s), and 42 second(s)
Version: 1.8.1
Safe Mode: false
Zookeeper Enabled: false
Live Workers: 1
Lost Workers: 0
Total Capacity: 10.00GB
Tier: MEM Size: 10.00GB
Used Capacity: 9.36GB
Tier: MEM Size: 9.36GB
Free Capacity: 651.55MB
$ curl http://$master:19999/metrics/json
1 默認配置
$ cat conf/alluxio-site.properties
alluxio.underfs.address=${alluxio.work.dir}/underFSStorage
2 命令示例
$ ls ./underFSStorage/
$ ./bin/alluxio fs persist /LICENSE
$ ls ./underFSStorage
LICENSE
With the default configuration, Alluxio uses the local file system as its under file storage (UFS). The default path for the UFS is ./underFSStorage.
Alluxio is currently writing data only into Alluxio space, not to the UFS.Configure Alluxio to persist the file from Alluxio space to the UFS by using the persist command.
Alluxio默認用的是本地文件系統做爲UFS,只有執行persist命令以後,文件纔會持久化到UFS中;
1 配置
$ cat conf/alluxio-site.properties
alluxio.underfs.address=hdfs://<NAMENODE>:<PORT>/alluxio/data
若是你想對hdfs上所有數據進行加速而且路徑不變,能夠配置爲hdfs的根目錄
2 配置hadoop
1)連接
$ ln -s $HADOOP_CONF_DIR/core-site.xml conf/core-site.xml
$ ln -s $HADOOP_CONF_DIR/hdfs-site.xml conf/hdfs-site.xml
Copy or make symbolic links from hdfs-site.xml and core-site.xml from your Hadoop installation into ${ALLUXIO_HOME}/conf
2)直接配置路徑
alluxio.underfs.hdfs.configuration=/path/to/hdfs/conf/core-site.xml:/path/to/hdfs/conf/hdfs-site.xml
3 命令
$ bin/alluxio fs ls /
能夠看到hdfs上全部的目錄了
4 文件映射
這時能夠經過訪問
alluxio://$alluxio_server:19998/test.log
來訪問底層存儲
hdfs://$namenode_server/alluxio/data/test.log
注意:這裏須要指定$alluxio_server和端口,存在單點問題,後續ha方式部署以後能夠解決這個問題。
1 準備:(二選一)
1)配置
spark.driver.extraClassPath /<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar
spark.executor.extraClassPath /<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar
This Alluxio client jar file can be found at /<PATH_TO_ALLUXIO>/client/alluxio-1.8.1-client.jar
2)拷貝jar
$ cp client/alluxio-1.8.1-client.jar $SPARK_HOME/jars/
2 訪問
$ spark-shell
scala> val s = sc.textFile("alluxio://localhost:19998/derby.log")
s: org.apache.spark.rdd.RDD[String] = alluxio://localhost:19998/derby.log MapPartitionsRDD[1] at textFile at <console>:24scala> s.foreach(println)
----------------------------------------------------------------
Thu Jan 10 11:05:45 CST 2019:
參考:http://www.alluxio.org/docs/1.8/en/compute/Spark.html
拷貝jar
$ cp client/alluxio-1.8.1-client.jar $HIVE_HOME/lib/
$ cp client/alluxio-1.8.1-client.jar $HADOOP_HOME/share/hadoop/common/lib/
重啓metastore和hiveserver2
即多worker+多master+zookeeper
1 配置集羣服務器間ssh可達
同上
2 配置
$ cat conf/alluxio-site.properties
#alluxio.master.hostname=<MASTER_HOSTNAME>
alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=<ZOOKEEPER_ADDRESS>
alluxio.master.journal.folder=hdfs://$namenode_server/alluxio/journal/
alluxio.worker.memory.size=20GB
將配置同步到集羣全部服務器
3 配置masters和workers
$ conf/masters
$master1
$master2$ conf/workers
$worker1
$worker2
$worker3
4 啓動
$ ./bin/alluxio-start.sh all SudoMount
5 訪問方式
alluxio://zkHost1:2181;zkHost2:2181;zkHost3:2181/path
若是client啓動時增長環境變量
-Dalluxio.zookeeper.address=zkHost1:2181,zkHost2:2181,zkHost3:2181 -Dalluxio.zookeeper.enabled=true
則能夠直接這樣訪問
alluxio:///path
6 與hdfs互通
拷貝jar
$ cp client/alluxio-1.8.1-client.jar $HADOOP_HOME/share/hadoop/common/lib/
將如下配置添加到 $HADOOP_CONF_DIR/core-site.xml
alluxio.zookeeper.enabled
alluxio.zookeeper.address
和
<property>
<name>fs.alluxio.impl</name>
<value>alluxio.hadoop.FileSystem</value>
</property>
則能夠經過hdfs客戶端訪問alluxio
$ hadoop fs -ls alluxio:///directory
參考:http://www.alluxio.org/docs/1.8/en/deploy/Running-Alluxio-On-a-Cluster.html
Alluxio還有不少種部署方式,其中一種是Alluxio on Yarn,對於相似Spark on Yarn的用戶來講,很是容易使用Alluxio來加速Spark。
詳見:
http://www.alluxio.org/docs/1.8/en/deploy/Running-Alluxio-On-Yarn.html