注意:關閉R以前務必保存工做空間,保證學習的連續性。這樣之前數據的控制檯命令執行的效果以及相關變量仍然保存在內存中。shell
1 訪問數據框變量app
建議:在read.table命令執行names查看要處理的變量dom
names(Squid) [1] "Sample" "Year" "Month" "Location" "Sex" "GSI"
1.1 str函數ide
str函數能夠查看數據框中每一個變量的屬性:函數
str(Squid) 'data.frame': 2644 obs. of 6 variables: $ Sample : int 1 2 3 4 5 6 7 8 9 10 ... $ Year : int 1 1 1 1 1 1 1 1 1 1 ... $ Month : int 1 1 1 1 1 1 1 1 1 2 ... $ Location: int 1 3 1 1 1 1 1 3 3 1 ... $ Sex : int 2 2 2 2 2 2 2 2 2 2 ... $ GSI : num 10.44 9.83 9.74 9.31 8.99 ...
Sample ,Yead,Month,Location,Sex這幾個變量是整型學習
GSI這個變量是數值型ui
GSI這個變量是存在於數據框Squid中的,不能經過在R控制檯中輸入GSI查看編碼
GSI 錯誤: 找不到對象'GSI'
1.2 函數中的數據參數--訪問數據框中的變量的最佳方式code
M1 <- lm(GSI ~ factor(Location)+factor(Year),data = Squid) M1
Call:
lm(formula = GSI ~ factor(Location) + factor(Year), data = Squid)
Coefficients:
(Intercept) factor(Location)2 factor(Location)3 factor(Location)4
1.3939 -2.2178 -0.1417 0.3138
factor(Year)2 factor(Year)3 factor(Year)4
1.3548 0.9564 1.2270
lm 是作線性迴歸的函數,data = Squid表示從數據框Squid中取變量orm
data = 並非適用於任何函數,eg:
mean(GSI,data = Squid) 錯誤於mean(GSI, data = Squid) : 找不到對象'GSI' 1.3 $ 符號 訪問變量的另一種方法 Squid$GSI Squid$GSI [1] 10.4432 9.8331 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045 [9] 7.2156 6.8372 6.3882 6.3672 6.2998 6.0726 5.8395 5.8070 [17] 5.7774 5.7757 5.6484 5.6141 5.6017 5.5510 5.3110 5.2970 [25] 5.2253 5.1667 5.1405 5.1292 5.0782 5.0612 5.0097 4.9745
或者
Squid[,6]
Squid[,6] [1] 10.4432 9.8331 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045 [9] 7.2156 6.8372 6.3882 6.3672 6.2998 6.0726 5.8395 5.8070 [17] 5.7774 5.7757 5.6484 5.6141 5.6017 5.5510 5.3110 5.2970 [25] 5.2253 5.1667 5.1405 5.1292 5.0782 5.0612 5.0097 4.9745
此時能夠經過mean求平均值
mean(Squid$GSI) [1] 2.187034
1.4 attach 函數
attach函數將數據框添加到R的搜索路徑中,此時就能夠經過GSI命令直接查看GSI數據
attach(Squid) GSI [1] 10.4432 9.8331 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045 [9] 7.2156 6.8372 6.3882 6.3672 6.2998 6.0726 5.8395 5.8070 [17] 5.7774 5.7757 5.6484 5.6141 5.6017 5.5510 5.3110 5.2970 [25] 5.2253 5.1667 5.1405 5.1292 5.0782 5.0612 5.0097 4.9745
此時就能夠直接使用相關函數了。
boxplot(GSI)
(額、、看不懂這個圖)
使用attach函數顯然應該當心保證變量名字的惟一性,若是與R自帶函數名字或者變量同樣確定會出問題。
attach使用總結:
(1)爲了不復制變量,避免輸入Squid$GSI兩次以上
(2)使用attach命令應該保證變量的惟一性
(3)若是要處理多個數據集,並且一次只處理一個數據集,使用detach函數將數據集從R搜索路徑中刪除
2 訪問數據集
首先執行detach(Squid)命令!!!
查看Squid中Sex的值
Squid$Sex [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 [36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1
顯示位移值
unique(Squid$Sex) [1] 2 1
其中1表示雄性2表示雌性
Sel <- Squid$Sex == 1 SquidM <- Squid[Sel,] SquidM Sample Year Month Location Sex GSI 24 24 1 5 1 1 5.2970 48 48 1 5 3 1 4.2968 58 58 1 6 1 1 3.5008 60 60 1 6 1 1 3.2487 61 61 1 6 1 1 3.2304
Sel <- Squid$Sex == 1這條命令生成一個向量與Sex具備相同的長度,若是Sex的值等於1則該變量的值爲TRUE,不然爲FALSE,這樣一個變量可稱爲布爾變量,能夠用來選擇行。
SquidM <- Squid[Sel,]這條命令表示選擇Squid中Sel等於TRUE的行,並將數據存儲到SquidM中。由於是選擇行,因此須要使用方闊號。
第三章未完待續...
go on
得到雌性數據
SquidF <- Squid[Squid$Sex == 2,] SquidF Sample Year Month Location Sex GSI 1 1 1 1 1 2 10.4432 2 2 1 1 3 2 9.8331 3 3 1 1 1 2 9.7356 4 4 1 1 1 2 9.3107 5 5 1 1 1 2 8.9926
下面幾條命令不解釋:
unique(Squid$Location) Squid123 <- Squid[Squid$Location == 1 | Squid$Location ==2 | Squid$Location == 3,] Squid123 <- Squid[Squid$Location != 4,] Squid123 <- Squid[Squid$Location < 4 ,] Squid123 <- Squid[Squid$Location <=3 ,] Squid123 <- Squid[Squid$Location >=1 &Squid$Location <=3 ,]
都是得到Location值爲1,2,3的行
unique(Squid$Location) [1] 1 3 4 2 Squid123 <- Squid[Squid$Location == 1 | Squid$Location ==2 | Squid$Location == 3,] Squid123 Sample Year Month Location Sex GSI 1 1 1 1 1 2 10.4432 2 2 1 1 3 2 9.8331 3 3 1 1 1 2 9.7356 4 4 1 1 1 2 9.3107 5 5 1 1 1 2 8.9926 6 6 1 1 1 2 8.7707
得到Location值爲1的雄性數據行
SquidM.1 <- Squid[Squid$Sex == 1 & Squid$Location == 1,] SquidM.1 Sample Year Month Location Sex GSI 24 24 1 5 1 1 5.2970 58 58 1 6 1 1 3.5008 60 60 1 6 1 1 3.2487
得到位置爲1或2的雄性數據
SquidM.12 <- Squid[Squid$Sex == 1 &( Squid$Location == 1 | Squid$Location == 2),] SquidM.12 Sample Year Month Location Sex GSI 24 24 1 5 1 1 5.2970 58 58 1 6 1 1 3.5008 60 60 1 6 1 1 3.2487
SquidM1 <- SquidM[Squid$Location == 1,] SquidM1 Sample Year Month Location Sex GSI 24 24 1 5 1 1 5.2970 58 58 1 6 1 1 3.5008 .......... .......... NA NA NA NA NA NA NA NA.1 NA NA NA NA NA NA NA.2 NA NA NA NA NA NA NA.3 NA NA NA NA NA NA NA.4 NA NA NA NA NA NA ..........
緣由分析:
以前獲得的SquidM表示雄性數據,顯然SquidM的行數與Squid$Location == 1 布爾向量的長度不一致。所以導出出現上面的現象。
2.1 數據排序
Ord1 <- order(Squid$Month) Squid[Ord1,] Sample Year Month Location Sex GSI 1 1 1 1 1 2 10.4432 2 2 1 1 3 2 9.8331 3 3 1 1 1 2 9.7356 4 4 1 1 1 2 9.3107
根據月份排序
也能夠只對一個變量進行排序
Squid$GSI[Ord1] [1] 10.4432 9.8331 9.7356 9.3107 8.9926 8.7707 8.2576 7.4045 [9] 7.2156 6.3882 6.0726 5.7757 1.2610 1.1997 0.8373 0.6716 [17] 0.5758 0.5518 0.4921 0.4808 0.3828 0.3289 0.2758 0.2506 [25] 0.2092 0.1792 0.1661 0.1618 0.1543 0.1541 0.1490 0.1379
3 使用相同的標識符組合兩個數據集
setwd("E:/R/R-beginer-guide/data/RBook") Sql1 <- read.table(file = "squid1.txt",header = TRUE) Sql2 <- read.table(file = "squid2.txt",header = TRUE) SquidMerged <- merge(Sql1,Sql2,by = "Sample") SquidMerged Sample GSI YEAR MONTH Location Sex 1 1 10.4432 1 1 1 2 2 2 9.8331 1 1 3 2 3 3 9.7356 1 1 1 2 4 5 8.9926 1 1 1 2 5 6 8.7707 1 1 1 2 6 7 8.2576 1 1 1 2
merge 命令採用兩個數據框Sql1 ,Sql2做爲參數並使用變量Sample做爲形同的標識符合並兩個數據。merger函數還有一個選項是all,缺省狀態值是FALSE:即若是Sql1或Sql2中的值有缺失,則將被忽略。若是all的值設置爲TRUE,可能會產生NA值
Sql11 <- read.table(file = "squid1.txt",header = TRUE) Sql21 <- read.table(file = "squid2.txt",header = TRUE) SquidMerged1 <- merge(Sql11,Sql21,by = "Sample") SquidMerged1
額、、這裏好像沒有出現NA,看來是數據沒有丟失
4 輸出數據
經過write.table將數據輸出爲ascii文件
write.table(SquidM,file = "MaleSquid_wujiahua.txt",sep = " ",quote = FALSE,append = FALSE,na = "NA")
查看工做目錄,生成了一個MaleSquid_wujiahua.txt文件,
打開:
Sample Year Month Location Sex GSI 24 24 1 5 1 1 5.297 48 48 1 5 3 1 4.2968 58 58 1 6 1 1 3.5008 60 60 1 6 1 1 3.2487 61 61 1 6 1 1 3.2304
說明:
write.table第一個參數表示要輸出的數據,第二參數是數據保存的文件名,sep = " " 寶成數據經過空格隔開,qoute=FALSE消除字符串的引號標識,na="NA"表示缺失值經過NA替換。append=TRUE表示把數據添加到文件的尾部
5 從新編碼分類變量
str(Squid) 'data.frame': 2644 obs. of 6 variables: $ Sample : int 1 2 3 4 5 6 7 8 9 10 ... $ Year : int 1 1 1 1 1 1 1 1 1 1 ... $ Month : int 1 1 1 1 1 1 1 1 1 2 ... $ Location: int 1 3 1 1 1 1 1 3 3 1 ... $ Sex : int 2 2 2 2 2 2 2 2 2 2 ... $ GSI : num 10.44 9.83 9.74 9.31 8.99 ...
其中Sex和locaton的值肯定,屬於分類變量。
在數據框中通常根據分類變量生成新的變量
Squid$fLocation <- factor(Squid$Location) Squid$fSex <- factor(Squid$Sex) Squid$fLocation [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1 Squid$fSex [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 [36] 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 1 ..................... [71] 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1 1 ..................... ..................... ..................... [2591] 1 1 1 1 1 1 1 1 1 2 1 1 2 1 1 2 1 1 1 2 1 2 1 1 2 1 1 2 1 2 2 1 1 1 1 [2626] 1 1 1 1 1 2 1 1 1 2 1 2 1 2 1 2 1 1 1 Levels: 1 2
fLocation和fSex只是名義變量,f表示他們是因子
levels:1,2能夠對其修改
Squid$fSex <- factor(Squid$Sex,levels = c(1,2),labels = c("M","F")) Squid$fSex [1] F F F F F F F F F F F F F F F F F F F F F F F M F F F F F F F F F F F [36] F F F F F F F F F F F F M F F F F F F F F F M F M M M M F M M M M M M .................. .................. .................. [2556] F M M M M F F M M M M M M M F M M M M M M F M M F M M M F M M F M M M [2591] M M M M M M M M M F M M F M M F M M M F M F M M F M M F M F F M M M M [2626] M M M M M F M M M F M F M F M F M M M Levels: M F
這樣每一個1被M替換,2被F替換
使用從新分類的因子變量
boxplot(GSI ~ fSex,data = Squid)
M1 <- lm(GSI ~ fSex+fLocation,data = Squid) M1 Call: lm(formula = GSI ~ fSex + fLocation, data = Squid) Coefficients: (Intercept) fSexF fLocation2 fLocation3 fLocation4 1.3593 2.0248 -1.8552 -0.1425 0.5876
summary(M1) Call: lm(formula = GSI ~ fSex + fLocation, data = Squid) Residuals: Min 1Q Median 3Q Max -3.4137 -1.3195 -0.1593 1.2039 11.2159 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.35926 0.07068 19.230 <2e-16 *** fSexF 2.02481 0.09427 21.479 <2e-16 *** fLocation2 -1.85525 0.20027 -9.264 <2e-16 *** fLocation3 -0.14248 0.12657 -1.126 0.2604 fLocation4 0.58756 0.34934 1.682 0.0927 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.415 on 2639 degrees of freedom Multiple R-squared: 0.1759, Adjusted R-squared: 0.1746 F-statistic: 140.8 on 4 and 2639 DF, p-value: < 2.2e-16
(才發現有這麼一個插入腳本功能)
M2 <- lm(GSI ~ factor(Sex)+factor(Location),data = Squid) summary(M2) Call: lm(formula = GSI ~ factor(Sex) + factor(Location), data = Squid) Residuals: Min 1Q Median 3Q Max -3.4137 -1.3195 -0.1593 1.2039 11.2159 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.35926 0.07068 19.230 <2e-16 *** factor(Sex)2 2.02481 0.09427 21.479 <2e-16 *** factor(Location)2 -1.85525 0.20027 -9.264 <2e-16 *** factor(Location)3 -0.14248 0.12657 -1.126 0.2604 factor(Location)4 0.58756 0.34934 1.682 0.0927 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.415 on 2639 degrees of freedom Multiple R-squared: 0.1759, Adjusted R-squared: 0.1746 F-statistic: 140.8 on 4 and 2639 DF, p-value: < 2.2e-16
估計的參數是一致的,可是第二種方式佔用的屏幕空間更大,傳說在二階,三階交互做用時將是一個嚴重的問題。
Squid$fLocation [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1 ........ [2626] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Levels: 1 2 3 4
Levels:的順序能夠更改
Squid$fLocation <- factor(Squid$Location,levels= c(2,3,1,4)) Squid$fLocation [1] 1 3 1 1 1 1 1 3 3 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [36] 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 1 3 1 1 1 1 1 1 1 1 [71] 1 1 1 1 1 3 1 1 3 1 1 3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 3 3 3 1 3 1 ... ] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Levels: 2 3 1 4
boxplot(GSI ~ fLocation,data = Squid)
注意:
在定義了fSex這個因子以後上面兩種寫法都是同樣的效果。
可是1有雙引號是必須的,由於fSex是因子
定義新的變量以後也能夠經過str命令查看
Squid$fSex <- factor(Squid$Sex,labels = c("M","F")) Squid$fLocation <- factor(Squid$Location) str(Squid) 'data.frame': 2644 obs. of 8 variables: $ Sample : int 1 2 3 4 5 6 7 8 9 10 ... $ Year : int 1 1 1 1 1 1 1 1 1 1 ... $ Month : int 1 1 1 1 1 1 1 1 1 2 ... $ Location : int 1 3 1 1 1 1 1 3 3 1 ... $ Sex : int 2 2 2 2 2 2 2 2 2 2 ... $ GSI : num 10.44 9.83 9.74 9.31 8.99 ... $ fLocation: Factor w/ 4 levels "1","2","3","4": 1 3 1 1 1 1 1 3 3 1 ... $ fSex : Factor w/ 2 levels "M","F": 2 2 2 2 2 2 2 2 2 2 ...
第三章總結:
write.table 把一個變量寫入到ascii文件中 write.table(Squid,file="test.txt")
order 肯定數據的排序 order(x)
merge 合併兩個數據框 merege(a,b,by="ID")
str 顯示一個對象的內部結構 str(Squid)
factor 定義變量做爲因子 factor(x)