SparkR-Install

時間 2019-11-06

標籤 sparkr install 简体版

原文原文鏈接

SparkR-Install

時間： 2017-03-30 23:05:18 閱讀：17 評論：0 收藏：0 [點我收藏+]

標籤：too 下載安裝jdk context writing 磁盤 anti 1.5 products html

1.下載R

https://cran.r-project.org/src/base/R-3/java

1.2 環境變量配置：sql

1.3 測試安裝：express

2.下載Rtools33

https://cran.r-project.org/bin/windows/Rtools/apache

2.1 配置環境變量json

2.2 測試：windows

3.安裝RStudio

https://www.rstudio.com/products/rstudio/download/ 直接下一步便可安裝app

4.安裝JDK並設置環境變量

4.1環境變量配置：less

4.2測試：機器學習

5.下載Spark安裝程序

5.1 URL: http://spark.apache.org/downloads.html

5.2解壓到本地磁盤的對應目錄

6.安裝Spark並設置環境變量

7.測試SparkR

注意：若是發現了提示 WARN NativeCodeLader：Unable to load native-hadoop library for your platform.....using

builtin-java classes where applicable 須要安裝本地的hadoop庫

8.下載hadoop庫並安裝

http://hadoop.apache.org/releases.html

9.設置hadoop環境變量

10.從新測試SparkR

10.1 若是測試時候出現如下提示，須要修改log4j文件INFO爲WARN，位於\spark\conf下

10.2 修改conf中的log4j文件：

10.3 從新運行SparkR

11.運行SprkR代碼

在Spark2.0中增長了RSparkSql進行Sql查詢

dataframe爲數據框操做

data-manipulation爲數據轉化

ml爲機器學習

11.1 使用crtl+ALT+鼠標左鍵打開控制檯在此文件夾下

11.2 執行spark-submit xxx.R文件便可

12.安裝SparkR包

12.1 將spark安裝目錄下的R/lib中的SparkR文件拷貝到..\R-3.3.2\library中，注意是將整個Spark文件夾，而非裏面每個文件。

源文件夾：

目的文件夾：

12.2 在RStudio中打開SparkR文件並運行代碼dataframe.R文件，採用Ctrl+Enter一行行執行便可

SparkR語言的dataframe.R源代碼以下

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

library(SparkR)

# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-DataFrame-example")
sqlContext <- sparkRSQL.init(sc)

# Create a simple local data.frame
localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))

# Convert local data frame to a SparkR DataFrame
df <- createDataFrame(sqlContext, localDF)

# Print its schema
printSchema(df)
# root
#  |-- name: string (nullable = true)
#  |-- age: double (nullable = true)

# Create a DataFrame from a JSON file
path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")
peopleDF <- read.json(sqlContext, path)
printSchema(peopleDF)

# Register this DataFrame as a table.
registerTempTable(peopleDF, "people")

# SQL statements can be run by using the sql methods provided by sqlContext
teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")

# Call collect to get a local data.frame
teenagersLocalDF <- collect(teenagers)

# Print the teenagers in our dataset 
print(teenagersLocalDF)

# Stop the SparkContext now
sparkR.stop()

13.Rsudio 運行結果

END~

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。