在windowns下安裝Anaconda3運行spark

1. 準備工做

1.1須要的軟件:
Anaconda3-5.0.0-Windows-x86_64
hadoop-2.7.4
jdk1.8+
spark-2.2.0-bin-hadoop2.7css

1.2下載軟件
Anaconda 官網下載地址:https://www.continuum.io/downloads
目前最新版本是 python 3.6,默認下載也是 Python 3.6,百度網盤下載地址:http://pan.baidu.com/s/1jIePjPc 密碼是:robu 固然,也能夠在官網下載最新版本的 Anaconda3,而後根據本身須要設置成 python 3.6
這裏寫圖片描述html

Hadoop 官網下載地址:http://hadoop.apache.org/releases.html
這裏寫圖片描述java

Spark 官網下載地址:http://spark.apache.org/downloads.html
這裏寫圖片描述python

jdk 下載官網地址:http://www.oracle.com/technetwork/java/javase/downloads/index.htmlgit

2.安裝並在windowns下配置環境變量

Anaconda 安裝較爲簡單,基本都是下一步,爲了不沒必要要的麻煩,最後默認安裝路徑,具體安裝過程爲:
雙擊安裝文件,啓動安裝程序
這裏寫圖片描述github

點擊I Agree 進行下一步操做
這裏寫圖片描述sql

點擊Next 進行下一步
這裏寫圖片描述
若是系統只有一個用戶選擇默認的第一個便可,若是有多個用戶並且都要用到 Anaconda ,則選擇第二個選項。shell

這裏寫圖片描述
爲了不以後沒必要要的麻煩,建議默認路徑安裝便可,須要佔用空間大約 1.8 G左右。apache

這裏寫圖片描述
安裝須要一段時間,等待安裝完成便可。windows

這裏寫圖片描述
到這裏就安裝完成了,能夠將「Learn more about Aanaconda Cloud」Learn more about Aanaconda Support」前的對號去掉,而後點擊「Finish」便可。

jdk1.8+也解壓到默認的路徑下;hadoop-2.7.4和spark-2.2.0-bin-hadoop2.7能夠裝在任意磁盤下

在windowns下配置環境變量(hadoop/spark/Java)

Java環境變量:
這裏寫圖片描述

hadoop環境變量:
這裏寫圖片描述

spark環境變量:
這裏寫圖片描述

配置path:
這裏寫圖片描述

上述操做以後,剩下的就是一直點」肯定」,這樣環境變量就配置好了

4.啓動 spark

在啓動以前須要在hadoop-2.7.4的bin目錄下,安裝winutils.exe文件,不然就會報錯,錯誤以下

E:\>spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/06/05 21:34:43 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2327) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:365) at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:229) at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:991) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:92) at $line3.$read$$iw$$iw.<init>(<console>:15) at $line3.$read$$iw.<init>(<console>:42) at $line3.$read.<init>(<console>:44) at $line3.$read$.<init>(<console>:48) at $line3.$read$.<clinit>(<console>) at $line3.$eval$.$print$lzycompute(<console>:7) at $line3.$eval$.$print(<console>:6) at $line3.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37) at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214) at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37) at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:105) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) at org.apache.spark.repl.Main$.doMain(Main.scala:69) at org.apache.spark.repl.Main$.main(Main.scala:52) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

或者不用裝hadoop,直接安裝一個winustil.exe 可是必須配置環境變量.接下來在」開始」菜單鍵找到」jupyter notebook」,雙擊運行
這裏寫圖片描述

啓動以後會看到以下圖所示:
這裏寫圖片描述

在瀏覽器上會出現以下圖所示:
這裏寫圖片描述

以後在頁面的右上角找到」New」建立」Python3」,如圖所示:
這裏寫圖片描述

以後在輸入以下代碼,啓動spark

import os
import sys

spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
    raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.4-src.zip'))
comm=os.path.join(spark_home, 'python/lib/py4j-0.10.4-src.zip')
print ('start spark....',comm)
exec(open(os.path.join(spark_home, 'python/pyspark/shell.py')).read())

如圖所示:
這裏寫圖片描述

至此,spark啓動成功

5.啓動時出現的錯誤及解決辦法

1.錯誤以下:

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
  at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
  at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
  ... 47 elided
Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
  ... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
  at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
  at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
  at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
  at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
  ... 63 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
  ... 71 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
  at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
  ... 76 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
  ... 84 more
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
  at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
  at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 85 more

錯誤消息中提示零時目錄 /tmp/hive 沒有寫的權限:

The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------

因此咱們須要更新E:/tmp/hive的權限(我在E盤下運行的spark-shell命令,若是在其餘盤運行,就改爲對應的盤符+/tmp/hive)。運行以下命令:

E:\>C:\winutils\bin\winutils.exe chmod 777 E:\tmp\hive

再次運行spark-shell,spark啓動成功。此時能夠經過 http://localhost:4040 來訪問Spark UI

解決錯誤的參考博客:https://yq.aliyun.com/articles/96424?t=t1
http://www.cnblogs.com/czm1032851561/p/5751722.html

相關文章
相關標籤/搜索