spark sql cli 配置使用

想使用spark sql cli 直接讀取hive中表來作分析的話只須要簡答的幾部設置就能夠了java

1.拷貝hive-site.xml 至spark confmysql

cd /usr/local/hive/conf/hive-site.xml /usr/local/spark-1.5.1/conf/

2.配置spark classpath ,添加mysql驅動類sql

$ vim conf/spark-env.sh
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$SPARK_LOCAL_DIRS/lib/mysql-connector-java-5.1.21.jar

3.啓動hive metastore vim

測試時,沒有這步也能夠成功啓動spark sql clicookie

$nohup hive --service metastore > metastore.log 2>&1 &

4.啓動 spark-sql cli測試

$ bin/spark-sql --master yarn-client
SET spark.sql.hive.version=1.2.1
SET spark.sql.hive.version=1.2.1
spark-sql>

 

以上就啓動完成了.spa

 

在hive中跟spark-sql中同時執行一個sql.看看效果code

hivexml

hive> select count(cookie) from dmp_data where age='21001';
.
.
.
Total MapReduce CPU Time Spent: 36 seconds 490 msec
OK
2839776
Time taken: 42.092 seconds, Fetched: 1 row(s)
hive> 

用時42秒blog

sparksql cli

$ bin/spark-sql --master yarn-client
spark-sql>select count(cookie) from dmp_data where age='21001';
.
.
.
15/12/28 14:11:55 INFO scheduler.DAGScheduler: ResultStage 3 (processCmd at CliDriver.java:376) finished in 2.402 s
15/12/28 14:11:55 INFO scheduler.DAGScheduler: Job 2 finished: processCmd at CliDriver.java:376, took 22.894938 s
2839776
Time taken: 23.917 seconds, Fetched 1 row(s)

用時23秒,少了一半.

當作一些邏輯不是很複雜的任務時,就能夠直接用sql來完成了

相關文章
相關標籤/搜索