R語言--變量與數據類型

時間 2019-11-19

標籤語言變量數據類型简体版

原文原文鏈接

R語言的數據分類

R語言的數據類型較多，但都是動態聲明，即變量不會聲明爲某種數據類型。變量分配爲R對象數組

向量
列表
矩陣
數組
數據幀
因子
下面是幾種最簡單對象的類型

# Atomic vector of type character.
print("abc");#character

# Atomic vector of type double.
print(12.5)#numeric

# Atomic vector of type integer.
print(63L)#integer

# Atomic vector of type logical.
print(TRUE)#logical

# Atomic vector of type complex.
print(2+3i)#complex

# Atomic vector of type raw.
print(charToRaw('hello'))#raw

向量vector

最簡單的是向量類型，即便用c()的形式聲明。
如下示例中，若是其中一個元素是字符，則非字符值被強制轉換爲字符類型app

# The logical and numeric values are converted to characters.
s <- c('apple','red',5,TRUE)
print(s)

實際上，向量的多元素能夠用冒號表示，好比函數

v <- 6.6:12.6
print(v)
w <- 3.8:11.4

即表示從6.6到12.6，逐次加一構成的向量；w表示從3.8逐次加一到10.8。還能夠用函數建立：編碼

# Create vector with elements from 5 to 9 incrementing by 0.4.
print(seq(5, 9, by = 0.4))

若是其中一個元素是字符，則非字符值被強制轉換爲字符類型。spa

# The logical and numeric values are converted to characters.
s <- c('apple','red',5,TRUE)
print(s)

訪問向量元素時，能夠用'[ ]'填入適當條件做爲索引。而且，向量支持數值運算，但必須是相同大小的。code

列表list

建立列表用list函數，而且其中能夠包含幾乎任何數據類型,能夠給list中的每一個元素命名。對象

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.
print(list_data)

列表訪問

訪問列表元素既能夠用序號直接索引，也能夠用名稱索引索引

# Access the first element of the list.
print(list_data[1])

# Access the list element using the name of the element.
print(list_data$A_Matrix)

列表操做

操縱列表元素時，直接進行賦值操做。另外能夠經過merged.list <- c(list1,list2)合併列表。ip

# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)

矩陣matrix

在R語言中建立矩陣的基本語法是ci

matrix(data, nrow, ncol, byrow, dimnames)

數據是成爲矩陣的數據元素的輸入向量。
nrow是要建立的行數。
ncol是要建立的列數。
byrow是一個邏輯線索。若是爲TRUE，則輸入向量元素按行排列。
dimname是分配給行和列的名稱。

矩陣訪問元素

訪問矩陣的元素直接用中括號填入矩陣下標訪問，即$a_{23}=M[2,3]$。或者用單一下標直接訪問整行或整列，即$a_{13},a_{23},\cdots,a_{m3}=M[,3]$。

矩陣計算

使用R運算符對矩陣執行各類數學運算。操做的結果也是一個矩陣。對於操做中涉及的矩陣，維度（行數和列數）應該相同。

數組
數組是能夠在兩個以上維度中存儲數據的R數據對象。下面的例子實際上進行建立數組，以及數組命名的步驟：

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,column.names, matrix.names))
print(result)

一樣的，數組的訪問相似於矩陣，以上數組有三個維度，訪問時用中括號以及兩個逗號能夠提取一個，或多個元素

print(array[1,3,4])
print(array[3, ,2])
print(array[2, , ])

操做數組的元素經過訪問數組的部分元素來執行。好比能夠用兩個逗號和一個維度的數字，來提取出矩陣。
咱們可使用apply()函數在數組中的元素上進行計算。

apply(x, margin, fun)

x是一個數組。
margin是所使用的數據集的名稱。
fun是要應用於數組元素的函數
從而進行數組內部的運算

因子
在R語言中，名義變量和有序變量可使用因子來表示。語法格式爲

f <- factor(x=charactor(), levels, labels=levels, exclude = NA, ordered = is.ordered(x), namax = NA)

levels：因子數據的水平，默認是x中不重複的值;
labels：標識某水平的名稱，與水平一一對應，以方便識別，默認取levels的值；
exclude：從x中剔除的水平值，默認爲NA值；
ordered：邏輯值，因子水平是否有順序（編碼次序），如有取TRUE，不然取FALSE；
nmax：水平個數的限制。
gl()函數用於定義有規律的因子向量，其語法格式以下

gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)

n: 正整數，表示因子的水平個數
k:正整數，表示每一個水平重複的次數；
length: 正整數，表示因子向量的長度，默認爲n*k
labels: 表示因子水平的名稱，默認值爲1：n
ordered：邏輯變量，表示因子水平是不是有次序的，默認值爲FALSE
而且factor()函數能夠把向量data轉化爲factor。簡單來講，因子就是一段具備二元層級順序的有限序列，print打印出的是其level層級。數據幀(data.frame)中的每一列也可看作因子。

v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)
# 結果爲
Tampa   Tampa   Tampa   Tampa   Seattle Seattle Seattle Seattle Boston 
[10] Boston  Boston  Boston 
Levels: Tampa Seattle Boston

數據幀

建立數據幀

# Create data frame
new.address <- data.frame(
   city = c("Lowry", "Charlotte"),
   state = c("CO", "FL"),
   zipcode = c("80230", "33949"),
   stringsAsFactors = FALSE
)

而且經過str()函數能夠看到數據幀的結構。能夠經過應用summary()函數獲取數據的統計摘要和性質。也能夠提取

# Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

# 先提取前兩行，再提取全部列
# Extract first two rows.
result <- emp.data[1:2,]

# 也能夠一併提取
result <- emp.data[c(3,5),c(2,4)]

要擴展數據幀只需使用新的列名稱添加列向量，注意要使用$對數據幀名稱進行索引。或者，添加行用rbind()函數，添加列用cbind()。