原文鏈接 http://xiguada.org/carbondata_compile/ sql
CarbonData是啥?
CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. In customer benchmarks, CarbonData has proven to manage Petabyte of data running on extraordinarily low-cost hardware and answers queries around 10 times faster than the current open source solutions (column-oriented SQL on Hadoop data-stores).
編譯安裝
本想迅速試用一下,不過官網竟然沒有現成編譯好的工程,沒辦法,只能本身編譯一個。
安裝須要三步(固然還須要jdk7或jdk8,,maven 3.3以上)
- 下載 Spark 1.5.0 或更新的版本。
- 下載並安裝 Apache Thrift 0.9.3,並確認加到系統路徑。
- 下載 Apache CarbonData code 並編譯。
1 Spark能夠直接下載,解壓後設置PATH可執行spark-submit。
2 安裝thrift前須要安裝依賴,個人虛擬機啊ubuntu下安裝依賴的命令以下。
sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev
而後到thrift下編譯安裝
./configure
sudo make
sudo make install
3 編譯CarbonData
mvn -DskipTests -Pspark-1.6 -Dspark.version=1.6.2 clean package
4 進入bin目錄,修改carbon-spark-sql 文件中的 /bin/spark-submit,改成spark-submit
5 生成sample.csv文件
cd carbondata
cat > sample.csv << EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF
6 執行
./carbon-spark-sql
spark-sql> create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
spark-sql> load data inpath '../sample.csv' into table test_table
spark-sql> select city, avg(age), sum(age) from test_table group by city
執行結果
shenzhen 29.0 58
wuhan 35.0 35
看起來和執行SparkSQL同樣,CarbonData這中間作了啥,有啥效果呢?後面繼續分析。