scala-2.9.3:一種編程語言,下載地址:http://www.scala-lang.org/download/
spark-1.4.0:必須是編譯好的Spark,若是下載的是Source,則須要本身根據環境使用SBT或者MAVEN從新編譯才能使用。
html
編譯好的 Spark下載地址:http://spark.apache.org/downloads.html。java
#解壓scala-2.9.3.tgz tar -zxvf scala-2.9.3.tgz #配置SCALA_HOME vi /etc/profile #添加以下環境 export SCALA_HOME=/home/apps/scala-2.9.3 export PATH=.:$SCALA_HOME/bin:$PATH #測試scala安裝是否成功 #直接輸入 scala
#解壓spark-1.4.0.tgz tar -zxvf spark-1.4.0.tgz #配置SPARK_HOME vi /etc/profile #添加以下環境 export SCALA_HOME=/home/apps/spark-1.4.0 export PATH=.:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
#複製slaves.template和 spark-env.sh.template各一份 cp spark-env.sh.template spark-env.sh cp slaves.template slaves #slaves,此文件是指定子節點的主機,直接添加子節點主機名便可
在spark-env.sh末端添加以下幾行:
node
#JDK安裝路徑 export JAVA_HOME=/root/app/jdk #SCALA安裝路徑 export SCALA_HOME=/root/app/scala-2.9.3 #主節點的IP地址 export SPARK_MASTER_IP=192.168.1.200 #分配的內存大小 export SPARK_WORKER_MEMORY=200m #指定hadoop的配置文件目錄 export HADOOP_CONF_DIR=/root/app/hadoop/etc/hadoop #指定worker工做時分配cpu數量 export SPARK_WORKER_CORES=1 #指定spark實例,通常1個足以 export SPARK_WORKER_INSTANCES=1 #jvm操做,在spark1.0以後增長了spark-defaults.conf默認配置文件,該配置參數在默認配置在該文件中 export SPARK_JAVA_OPTS
spark-defaults.conf中還有以下配置參數:
mysql
SPARK.MASTER //spark://hostname:8080 SPARK.LOCAL.DIR //spark工做目錄(作shuffle的目錄) SPARK.EXECUTOR.MEMORY //spark1.0拋棄SPARK_MEM參數,使用該參數
在主節點機器上啓動順序 一、先啓動hdfs(./sbin/start-dfs.sh) 二、啓動spark-master(./sbin/start-master.sh) 三、啓動spark-worker(./sbin/start-slaves.sh) 四、jps查看進程有 主節點:namenode、secondrynamnode、master 從節點:datanode、worker 五、啓動spark-shell 15/06/21 21:23:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/06/21 21:23:47 INFO spark.SecurityManager: Changing view acls to: root 15/06/21 21:23:47 INFO spark.SecurityManager: Changing modify acls to: root 15/06/21 21:23:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/06/21 21:23:47 INFO spark.HttpServer: Starting HTTP Server 15/06/21 21:23:47 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/06/21 21:23:47 INFO server.AbstractConnector: Started SocketConnector@0 .0.0.0:38651 15/06/21 21:23:47 INFO util.Utils: Successfully started service 'HTTP class server' on port 38651. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.0 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_65) Type in expressions to have them evaluated. Type :help for more information. 15/06/21 21:23:54 INFO spark.SparkContext: Running Spark version 1.4.0 15/06/21 21:23:54 INFO spark.SecurityManager: Changing view acls to: root 15/06/21 21:23:54 INFO spark.SecurityManager: Changing modify acls to: root 15/06/21 21:23:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/06/21 21:23:56 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/06/21 21:23:56 INFO Remoting: Starting remoting 15/06/21 21:23:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.200:57658] 15/06/21 21:23:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 57658. 15/06/21 21:23:58 INFO spark.SparkEnv: Registering MapOutputTracker 15/06/21 21:23:58 INFO spark.SparkEnv: Registering BlockManagerMaster 15/06/21 21:23:58 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-4f1badf6-1e92-47ca-98a2-6d82f4882f15/blockmgr-530e4335-9e59-45d4-b9fb-6014089f5a00 15/06/21 21:23:58 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB 15/06/21 21:23:59 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-4f1badf6-1e92-47ca-98a2-6d82f4882f15/httpd-4b2cca3c-e8d4-4ab3-9c3d-38ec579ec873 15/06/21 21:23:59 INFO spark.HttpServer: Starting HTTP Server 15/06/21 21:23:59 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/06/21 21:23:59 INFO server.AbstractConnector: Started SocketConnector@0 .0.0.0:51899 15/06/21 21:23:59 INFO util.Utils: Successfully started service 'HTTP file server' on port 51899. 15/06/21 21:23:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator 15/06/21 21:23:59 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/06/21 21:23:59 INFO server.AbstractConnector: Started SelectChannelConnector@0 .0.0.0:4040 15/06/21 21:23:59 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 15/06/21 21:23:59 INFO ui.SparkUI: Started SparkUI at http://192.168.1.200:4040 15/06/21 21:24:00 INFO executor.Executor: Starting executor ID driver on host localhost 15/06/21 21:24:00 INFO executor.Executor: Using REPL class URI: http://192.168.1.200:38651 15/06/21 21:24:01 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59385. 15/06/21 21:24:01 INFO netty.NettyBlockTransferService: Server created on 59385 15/06/21 21:24:01 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/06/21 21:24:01 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:59385 with 267.3 MB RAM, BlockManagerId(driver, localhost, 59385) 15/06/21 21:24:01 INFO storage.BlockManagerMaster: Registered BlockManager 15/06/21 21:24:02 INFO repl.SparkILoop: Created spark context.. Spark context available as sc. 15/06/21 21:24:03 INFO hive.HiveContext: Initializing execution hive, version 0.13.1 15/06/21 21:24:04 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 15/06/21 21:24:04 INFO metastore.ObjectStore: ObjectStore, initialize called 15/06/21 21:24:04 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 15/06/21 21:24:04 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 15/06/21 21:24:05 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/06/21 21:24:07 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/06/21 21:24:14 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 15/06/21 21:24:14 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "". 15/06/21 21:24:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 15/06/21 21:24:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 15/06/21 21:24:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 15/06/21 21:24:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 15/06/21 21:24:19 INFO metastore.ObjectStore: Initialized ObjectStore 15/06/21 21:24:20 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa 15/06/21 21:24:24 INFO metastore.HiveMetaStore: Added admin role in metastore 15/06/21 21:24:24 INFO metastore.HiveMetaStore: Added public role in metastore 15/06/21 21:24:24 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 15/06/21 21:24:25 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 15/06/21 21:24:25 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. 六、使用wordcount例子測試,啓動spark-shell以前先上傳一份文件到hdfs 七、代碼: val file = sc.textFile("hdfs://hadoop.master:9000/data/intput/wordcount.data") val count = file.flatMap(line=>(line.split(" "))).map(word=>(word,1)).reduceByKey(_+_) count.collect() count.textAsFile("hdfs://hadoop.master:9000/data/output") 理解上面的代碼你須要學習scala語言。 直接打印結果:hadoop dfs -cat /data/output/p* (im,1) (are,1) (yes,1) (hi,2) (do,1) (no,3) (to,1) (lll,1) (,3) (hello,3) (xiaoming,1) (ga,1) (world,1)