[spark@master ~]$ spark-shell --master yarn-client --jars /app/soft/hive/lib/mysql-connector-java-5.1.44-bin.jar scala> import org.apache.spark.sql.SQLContext import org.apache.spark.sql.SQLContext scala> val sqlContext = new SQLContext(sc) warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@432a6a69 scala> val res = sqlContext.sql("select * from lb") res: org.apache.spark.sql.DataFrame = [cookieid: string, createtime: string ... 1 more field] scala> res.show() +--------+----------+---+ |cookieid|createtime| pv| +--------+----------+---+ | cookie1|2015-11-11| 1| | cookie1|2015-11-12| 4| | cookie1|2015-11-13| 5| | cookie1|2015-11-14| 4| | cookie2|2015-11-11| 7| | cookie2|2015-11-12| 3| | cookie2|2015-11-13| 8| | cookie2|2015-11-14| 2| +--------+----------+---+
建表java
scala> val path = "hdfs://master:9000/data/Romeo_and_Juliet.txt" path: String = hdfs://master:9000/data/Romeo_and_Juliet.txt scala> val df2 = spark.sparkContext.textFile(path).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).toDF("word","count") df2: org.apache.spark.sql.DataFrame = [word: string, count: int] scala> df2.write.mode("overwrite").saveAsTable("badou.test_a") 18/01/28 08:15:10 WARN metastore.HiveMetaStore: Location: hdfs://master:9000/user/hive/warehouse/badou.db/test_a specified for non-external table:test_a -------------------- hive> use badou; hive> show tables; hive> select * from test_a order by count desc limit 10; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1516801273097_0045, Tracking URL = http://master:8088/proxy/application_1516801273097_0045/ Kill Command = /app/soft/hadoop/bin/hadoop job -kill job_1516801273097_0045 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-01-28 09:08:22,144 Stage-1 map = 0%, reduce = 0% 2018-01-28 09:08:29,615 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.37 sec 2018-01-28 09:08:37,987 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.18 sec MapReduce Total cumulative CPU time: 3 seconds 180 msec Ended Job = job_1516801273097_0045 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.18 sec HDFS Read: 54970 HDFS Write: 69 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 180 msec OK 4132 the 614 I 531 and 462 to 449 a 392 of 364 my 313 is 290 in 282 Time taken: 28.159 seconds, Fetched: 10 row(s)