reshape的兩個函數melt和dcast

reshape

Reshape包主要是用來作數據變形的。其中主要的有兩個函數meltdcast1。其中melt主要用於寬變長,而dcast1主要用於長變寬。
meltdcast1reshape2包中函數的擴展。
v1.9.6起,無需載入reshape2就能夠使用這些函數。只須要載入data.table便可。若是必須載reshape2包,請肯定在載入data.table前載入。 api

melt函數

假設咱們有數據以下:函數

library(data.table)
DT=fread("melt_default.csv")
head(DT)
##    family_id age_mother dob_child1 dob_child2 dob_child3
## 1:         1         30 1998-11-26 2000-01-29         NA
## 2:         2         27 1996-06-22         NA         NA
## 3:         3         26 2002-07-11 2004-04-05 2007-09-02
## 4:         4         32 2004-10-10 2009-08-27 2012-07-21
## 5:         5         29 2000-12-05 2005-02-28         NA
str(DT)
## Classes 'data.table' and 'data.frame':   5 obs. of  5 variables:
##  $ family_id : int  1 2 3 4 5
##  $ age_mother: int  30 27 26 32 29
##  $ dob_child1: chr  "1998-11-26" "1996-06-22" "2002-07-11" "2004-10-10" ...
##  $ dob_child2: chr  "2000-01-29" NA "2004-04-05" "2009-08-27" ...
##  $ dob_child3: chr  NA NA "2007-09-02" "2012-07-21" ...
##  - attr(*, ".internal.selfref")=<externalptr>
DT.m1=melt(DT,id.vars = c("family_id","age_mother"),measure.vars = c("dob_child1","dob_child2","dob_child3"))

DT.m1
##     family_id age_mother   variable      value
##  1:         1         30 dob_child1 1998-11-26
##  2:         2         27 dob_child1 1996-06-22
##  3:         3         26 dob_child1 2002-07-11
##  4:         4         32 dob_child1 2004-10-10
##  5:         5         29 dob_child1 2000-12-05
##  6:         1         30 dob_child2 2000-01-29
##  7:         2         27 dob_child2         NA
##  8:         3         26 dob_child2 2004-04-05
##  9:         4         32 dob_child2 2009-08-27
## 10:         5         29 dob_child2 2005-02-28
## 11:         1         30 dob_child3         NA
## 12:         2         27 dob_child3         NA
## 13:         3         26 dob_child3 2007-09-02
## 14:         4         32 dob_child3 2012-07-21
## 15:         5         29 dob_child3         NA
str(DT.m1)
## Classes 'data.table' and 'data.frame':   15 obs. of  4 variables:
##  $ family_id : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ age_mother: int  30 27 26 32 29 30 27 26 32 29 ...
##  $ variable  : Factor w/ 3 levels "dob_child1","dob_child2",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ value     : chr  "1998-11-26" "1996-06-22" "2002-07-11" "2004-10-10" ...
##  - attr(*, ".internal.selfref")=<externalptr>
  • measure.vars界定了收縮的列的集合。
  • 收縮的列會自動變成variablevalue

在命令中能夠對variablevalue的列名進行更改。若是id.varsmeasure.vars沒有指定,方法會將其中非數值的、證書的和邏輯值的列做爲id.vars。同時會輸出警告信息。code

DT.m1=melt(DT,measure.vars = c("dob_child1","dob_child2","dob_child3"),variable.name = "child",value.name = "dob")
DT.m1
##     family_id age_mother      child        dob
##  1:         1         30 dob_child1 1998-11-26
##  2:         2         27 dob_child1 1996-06-22
##  3:         3         26 dob_child1 2002-07-11
##  4:         4         32 dob_child1 2004-10-10
##  5:         5         29 dob_child1 2000-12-05
##  6:         1         30 dob_child2 2000-01-29
##  7:         2         27 dob_child2         NA
##  8:         3         26 dob_child2 2004-04-05
##  9:         4         32 dob_child2 2009-08-27
## 10:         5         29 dob_child2 2005-02-28
## 11:         1         30 dob_child3         NA
## 12:         2         27 dob_child3         NA
## 13:         3         26 dob_child3 2007-09-02
## 14:         4         32 dob_child3 2012-07-21
## 15:         5         29 dob_child3         NA

dcast函數

dcast將數據從長邊短。ci

dcast(DT.m1,family_id+age_mother~ child,value.var = "dob")
##    family_id age_mother dob_child1 dob_child2 dob_child3
## 1:         1         30 1998-11-26 2000-01-29         NA
## 2:         2         27 1996-06-22         NA         NA
## 3:         3         26 2002-07-11 2004-04-05 2007-09-02
## 4:         4         32 2004-10-10 2009-08-27 2012-07-21
## 5:         5         29 2000-12-05 2005-02-28         NA
  • dcast使用公式界面。
  • value.var說明列將會變成寬格式。

如何知道每一個家庭小孩子的數量呢?io

dcast(DT.m1,family_id~.,fun.aggregate = function(x)sum(!is.na(x)),value.var = "dob")
##    family_id .
## 1:         1 2
## 2:         2 1
## 3:         3 3
## 4:         4 3
## 5:         5 2

參考文獻: Efficient reshaping using data.tablestable

相關文章
相關標籤/搜索