標籤:too 下載 安裝jdk context writing 磁盤 anti 1.5 products html
https://cran.r-project.org/src/base/R-3/java
1.2 環境變量配置:sql
1.3 測試安裝:express
https://cran.r-project.org/bin/windows/Rtools/apache
2.1 配置環境變量json
2.2 測試:windows
https://www.rstudio.com/products/rstudio/download/ 直接下一步便可安裝app
4.1環境變量配置:less
4.2測試:機器學習
5.1 URL: http://spark.apache.org/downloads.html
5.2解壓到本地磁盤的對應目錄
注意:若是發現了提示 WARN NativeCodeLader:Unable to load native-hadoop library for your platform.....using
builtin-java classes where applicable 須要安裝本地的hadoop庫
http://hadoop.apache.org/releases.html
10.1 若是測試時候出現如下提示,須要修改log4j文件INFO爲WARN,位於\spark\conf下
10.2 修改conf中的log4j文件:
10.3 從新運行SparkR
在Spark2.0中增長了RSparkSql進行Sql查詢
dataframe爲數據框操做
data-manipulation爲數據轉化
ml爲機器學習
11.1 使用crtl+ALT+鼠標左鍵 打開控制檯在此文件夾下
11.2 執行spark-submit xxx.R文件便可
12.1 將spark安裝目錄下的R/lib中的SparkR文件拷貝到..\R-3.3.2\library中,注意是將整個Spark文件夾,而非裏面每個文件。
源文件夾:
目的文件夾:
12.2 在RStudio中打開SparkR文件並運行代碼dataframe.R文件,採用Ctrl+Enter一行行執行便可
SparkR語言的dataframe.R源代碼以下
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # library(SparkR) # Initialize SparkContext and SQLContext sc <- sparkR.init(appName="SparkR-DataFrame-example") sqlContext <- sparkRSQL.init(sc) # Create a simple local data.frame localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18)) # Convert local data frame to a SparkR DataFrame df <- createDataFrame(sqlContext, localDF) # Print its schema printSchema(df) # root # |-- name: string (nullable = true) # |-- age: double (nullable = true) # Create a DataFrame from a JSON file path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json") peopleDF <- read.json(sqlContext, path) printSchema(peopleDF) # Register this DataFrame as a table. registerTempTable(peopleDF, "people") # SQL statements can be run by using the sql methods provided by sqlContext teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19") # Call collect to get a local data.frame teenagersLocalDF <- collect(teenagers) # Print the teenagers in our dataset print(teenagersLocalDF) # Stop the SparkContext now sparkR.stop()
END~