SparkR-Install

SparkR-Install

時間: 2017-03-30 23:05:18      閱讀:17      評論:0      收藏:0      [點我收藏+]

標籤:too   下載   安裝jdk   context   writing   磁盤   anti   1.5   products   html

1.下載R

https://cran.r-project.org/src/base/R-3/java

 技術分享

1.2 環境變量配置:sql

技術分享

1.3 測試安裝:express

技術分享

 

2.下載Rtools33

https://cran.r-project.org/bin/windows/Rtools/apache

技術分享

2.1 配置環境變量json

技術分享

2.2 測試:windows

技術分享

3.安裝RStudio

    https://www.rstudio.com/products/rstudio/download/ 直接下一步便可安裝app

    技術分享

4.安裝JDK並設置環境變量

4.1環境變量配置:less

   技術分享

  技術分享

  技術分享

4.2測試:機器學習

技術分享技術分享

5.下載Spark安裝程序

  5.1 URL: http://spark.apache.org/downloads.html

    技術分享

 

     5.2解壓到本地磁盤的對應目錄

 

      技術分享

6.安裝Spark並設置環境變量

    技術分享

   技術分享

7.測試SparkR

  技術分享

  技術分享

  注意:若是發現了提示 WARN NativeCodeLader:Unable to load native-hadoop library for your platform.....using

builtin-java classes where applicable  須要安裝本地的hadoop庫

8.下載hadoop庫並安裝

  http://hadoop.apache.org/releases.html

  技術分享

   技術分享

 

9.設置hadoop環境變量

   技術分享

   技術分享

10.從新測試SparkR

   10.1 若是測試時候出現如下提示,須要修改log4j文件INFO爲WARN,位於\spark\conf下

   技術分享

    10.2 修改conf中的log4j文件:

    技術分享

       技術分享

     10.3 從新運行SparkR

     技術分享

11.運行SprkR代碼

    在Spark2.0中增長了RSparkSql進行Sql查詢

    dataframe爲數據框操做

    data-manipulation爲數據轉化

    ml爲機器學習

    技術分享

   11.1 使用crtl+ALT+鼠標左鍵 打開控制檯在此文件夾下

  技術分享

  11.2 執行spark-submit xxx.R文件便可

 技術分享

12.安裝SparkR包

    12.1 將spark安裝目錄下的R/lib中的SparkR文件拷貝到..\R-3.3.2\library中,注意是將整個Spark文件夾,而非裏面每個文件。

    源文件夾:

    技術分享  

     目的文件夾:

        技術分享

 

     12.2  在RStudio中打開SparkR文件並運行代碼dataframe.R文件,採用Ctrl+Enter一行行執行便可

SparkR語言的dataframe.R源代碼以下

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

library(SparkR)

# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-DataFrame-example")
sqlContext <- sparkRSQL.init(sc)

# Create a simple local data.frame
localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))

# Convert local data frame to a SparkR DataFrame
df <- createDataFrame(sqlContext, localDF)

# Print its schema
printSchema(df)
# root
#  |-- name: string (nullable = true)
#  |-- age: double (nullable = true)

# Create a DataFrame from a JSON file
path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")
peopleDF <- read.json(sqlContext, path)
printSchema(peopleDF)

# Register this DataFrame as a table.
registerTempTable(peopleDF, "people")

# SQL statements can be run by using the sql methods provided by sqlContext
teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")

# Call collect to get a local data.frame
teenagersLocalDF <- collect(teenagers)

# Print the teenagers in our dataset 
print(teenagersLocalDF)

# Stop the SparkContext now
sparkR.stop()

13.Rsudio 運行結果

      技術分享

 

END~

本站公眾號
   歡迎關注本站公眾號,獲取更多信息