Spark SQL provides JDBC connectivity, which is useful for connecting business intelligence (BI) tools to a Spark cluster and for sharing a cluster across multipleusers. The JDBC server runs as a standalone Spark driver program that can be shared by multiple clients. Any client can cache tables in memory, query them, and so on and the cluster resources and cached data will be shared among all of them.java
Spark SQL’s JDBC server corresponds to the HiveServer2 in Hive. It is also known as the 「Thrift server」 since it uses the Thrift communication protocol. Note that the JDBC server requires Spark be built with Hive supportsql
集羣環境:CDH5.3.0
shell
具體JAR版本以下:ide
spark版本:1.2.0-cdh5.3.0oop
hive版本:0.13.1-cdh5.3.0ui
hadoop版本:2.5.0-cdh5.3.0spa
cd /etc/spark/conf ln -s /etc/hive/conf/hive-site.xml hive-site.xml cd /opt/cloudera/parcels/CDH/lib/spark/ chmod- -R 777 logs/ cd /opt/cloudera/parcels/CDH/lib/spark/sbin ./start-thriftserver.sh --master yarn --hiveconf hive.server2.thrift.port=10008
cd /opt/cloudera/parcels/CDH/lib/spark/bin beeline -u jdbc:hive2://hadoop04:10000 [root@hadoop04 bin]# beeline -u jdbc:hive2://hadoop04:10000 scan complete in 2ms Connecting to jdbc:hive2://hadoop04:10000 Connected to: Spark SQL (version 1.2.0) Driver: Hive JDBC (version 0.13.1-cdh5.3.0) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 0.13.1-cdh5.3.0 by Apache Hive 0: jdbc:hive2://hadoop04:10000>
Within the Beeline client, you can use standard HiveQL commands to create, list, and query tables. You can find the full details of HiveQL in the Hive Language Manual,but here, we show a few common operations.code
CREATE TABLE IF NOT EXISTS mytable (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; create table mytable(name string,addr string,status string) row format delimited fields terminated by '#' #加載本地文件 load data local inpath '/external/tmp/data.txt' into table mytable #加載hdfs文件 load data inpath 'hdfs://ju51nn/external/tmp/data.txt' into table mytable; describe mytable; explain select * from mytable where name = '張三' select * from mytable where name = '張三' cache table mytable select count(*) total,count(distinct addr) num1,count(distinct status) num2 from mytable where addr='gz'; uncache table mytable
張三#廣州#學生 李四#貴州#教師 王五#武漢#講師 趙六#成都#學生 lisa#廣州#學生 lily#gz#studene
Spark SQL also supports a simple shell you can use as a single process: spark-sqlorm
它主要用於本地的開發環境,在共享集羣環境中,請使用JDBC SERVERserver
cd /opt/cloudera/parcels/CDH/lib/spark/bin ./spark-sql