1.用到的maven依賴html
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.11</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch-hadoop</artifactId> <version>2.4.0</version> </dependency>
注意:上面兩個依賴的順序不能換,不然編譯代碼的Scala版本會變成 2.10(這是由於maven順序加載pom中的依賴jar),會致使下述問題:java
15/05/26 21:33:24 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object; at stream.tan14.cn.streamTest$.main(streamTest.scala:25) at stream.tan14.cn.streamTest.main(streamTest.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
二、spark和elasticsearch 整合查詢接口node
1)參考地址 :sql
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl.htmlapache
https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/spark.html#spark-installationelasticsearch
2)接口代碼:maven
val query = """{ "query": { "bool": { "must": [{ "range":{ "updatetime": { "gte": "" } } }] } } }"""
// 上述query用於過濾es數據,若是沒有添加這一項,直接用spark的dataframe 過濾,性能會受到很大的影響!!
val options = Map("es.nodes" -> ES_URL, "es.port" -> ES_PORT, "es.query" -> query) ctx.read.format("org.elasticsearch.spark.sql").options(options).load("index/type").registerTempTable("test")