We can check if a variable is a data frame or not using the class()
function.ide
> x SN Age Name 1 1 21 John 2 2 15 Dora > typeof(x) # data frame is a special case of list [1] "list" > class(x) [1] "data.frame"
In this example, x
can be considered as a list of 3 components with each component having a two element vector. Some useful functions to know more about a data frame are given below.this
> names(x) [1] "SN" "Age" "Name" > ncol(x) [1] 3 > nrow(x) [1] 2 > length(x) # returns length of the list, same as ncol() [1] 3
We can create a data frame using the data.frame()
functioncode
For example, the above shown data frame can be created as follows.component
> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora")) > str(x) # structure of x 'data.frame': 2 obs. of 3 variables: $ SN : int 1 2 $ Age : num 21 15 $ Name: Factor w/ 2 levels "Dora","John": 2 1
Notice above that the third column, Name
is of type factor, instead of a character vector.ci
By default, data.frame()
function converts character vector into factor.element
To suppress this behavior, we can pass the argument stringsAsFactors=FALSE
.get
> x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John", "Dora"), stringsAsFactors = FALSE) > str(x) # now the third column is a character vector 'data.frame': 2 obs. of 3 variables: $ SN : int 1 2 $ Age : num 21 15 $ Name: chr "John" "Dora"
Many data input functions of R like, read.table()
, read.csv()
, read.delim()
, read.fwf()
also read data into a data frame.input
Components of data frame can be accessed like a list or like a matrix.string
We can use either [
, [[
or $
operator to access columns of data frame.it
> x["Name"] Name 1 John 2 Dora > x$Name [1] "John" "Dora" > x[["Name"]] [1] "John" "Dora" > x[[3]] [1] "John" "Dora"
Accessing with [[
or $
is similar. However, it differs for [
in that, indexing with [
will return us a data frame but the other two will reduce it into a vector.
Data frames can be accessed like a matrix by providing index for row and column.
To illustrate this, we use datasets already available in R. Datasets that are available can be listed with the command library(help = "datasets")
.
We will use the trees
dataset which contains Girth
, Height
and Volume
for Black Cherry Trees.
A data frame can be examined using functions like str()
and head()
.
> str(trees) 'data.frame': 31 obs. of 3 variables: $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ... $ Height: num 70 65 63 72 81 83 66 75 80 75 ... $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ... > head(trees,n=3) Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2
We can see that trees
is a data frame with 31 rows and 3 columns. We also display the first 3 rows of the data frame.
Now we proceed to access the data frame like a matrix.
> trees[2:3,] # select 2nd and 3rd row Girth Height Volume 2 8.6 65 10.3 3 8.8 63 10.2 > trees[trees$Height > 82,] # selects rows with Height greater than 82 Girth Height Volume 6 10.8 83 19.7 17 12.9 85 33.8 18 13.3 86 27.4 31 20.6 87 77.0 > trees[10:12,2] [1] 75 79 76
We can see in the last case that the returned type is a vector since we extracted data from a single column.
This behavior can be avoided by passing the argument drop=FALSE
as follows.
> trees[10:12,2, drop = FALSE] Height 10 75 11 79 12 76
Data frames can be modified like we modified matrices through reassignment.
> x SN Age Name 1 1 21 John 2 2 15 Dora > x[1,"Age"] <- 20; x SN Age Name 1 1 20 John 2 2 15 Dora
Rows can be added to a data frame using the rbind()
function.
> rbind(x,list(1,16,"Paul")) SN Age Name 1 1 20 John 2 2 15 Dora 3 1 16 Paul
Similarly, we can add columns using cbind()
.
> cbind(x,State=c("NY","FL")) SN Age Name State 1 1 20 John NY 2 2 15 Dora FL
Since data frames are implemented as list, we can also add new columns through simple list-like assignments.
> x SN Age Name 1 1 20 John 2 2 15 Dora > x$State <- c("NY","FL"); x SN Age Name State 1 1 20 John NY 2 2 15 Dora FL
Data frame columns can be deleted by assigning NULL
to it.
> x$State <- NULL > x SN Age Name 1 1 20 John 2 2 15 Dora
Similarly, rows can be deleted through reassignments.
> x <- x[-1,] > x SN Age Name 2 2 15 Dora