Reshape
包主要是用來作數據變形的。其中主要的有兩個函數melt
和dcast1
。其中melt
主要用於寬變長,而dcast1
主要用於長變寬。melt
和dcast1
是reshape2
包中函數的擴展。
在v1.9.6
起,無需載入reshape2
就能夠使用這些函數。只須要載入data.table
便可。若是必須載reshape2
包,請肯定在載入data.table
前載入。 api
假設咱們有數據以下:函數
library(data.table) DT=fread("melt_default.csv") head(DT)
## family_id age_mother dob_child1 dob_child2 dob_child3 ## 1: 1 30 1998-11-26 2000-01-29 NA ## 2: 2 27 1996-06-22 NA NA ## 3: 3 26 2002-07-11 2004-04-05 2007-09-02 ## 4: 4 32 2004-10-10 2009-08-27 2012-07-21 ## 5: 5 29 2000-12-05 2005-02-28 NA
str(DT)
## Classes 'data.table' and 'data.frame': 5 obs. of 5 variables: ## $ family_id : int 1 2 3 4 5 ## $ age_mother: int 30 27 26 32 29 ## $ dob_child1: chr "1998-11-26" "1996-06-22" "2002-07-11" "2004-10-10" ... ## $ dob_child2: chr "2000-01-29" NA "2004-04-05" "2009-08-27" ... ## $ dob_child3: chr NA NA "2007-09-02" "2012-07-21" ... ## - attr(*, ".internal.selfref")=<externalptr>
DT.m1=melt(DT,id.vars = c("family_id","age_mother"),measure.vars = c("dob_child1","dob_child2","dob_child3")) DT.m1
## family_id age_mother variable value ## 1: 1 30 dob_child1 1998-11-26 ## 2: 2 27 dob_child1 1996-06-22 ## 3: 3 26 dob_child1 2002-07-11 ## 4: 4 32 dob_child1 2004-10-10 ## 5: 5 29 dob_child1 2000-12-05 ## 6: 1 30 dob_child2 2000-01-29 ## 7: 2 27 dob_child2 NA ## 8: 3 26 dob_child2 2004-04-05 ## 9: 4 32 dob_child2 2009-08-27 ## 10: 5 29 dob_child2 2005-02-28 ## 11: 1 30 dob_child3 NA ## 12: 2 27 dob_child3 NA ## 13: 3 26 dob_child3 2007-09-02 ## 14: 4 32 dob_child3 2012-07-21 ## 15: 5 29 dob_child3 NA
str(DT.m1)
## Classes 'data.table' and 'data.frame': 15 obs. of 4 variables: ## $ family_id : int 1 2 3 4 5 1 2 3 4 5 ... ## $ age_mother: int 30 27 26 32 29 30 27 26 32 29 ... ## $ variable : Factor w/ 3 levels "dob_child1","dob_child2",..: 1 1 1 1 1 2 2 2 2 2 ... ## $ value : chr "1998-11-26" "1996-06-22" "2002-07-11" "2004-10-10" ... ## - attr(*, ".internal.selfref")=<externalptr>
measure.vars
界定了收縮的列的集合。 variable
和value
在命令中能夠對variable
和value
的列名進行更改。若是id.vars
和measure.vars
沒有指定,方法會將其中非數值的、證書的和邏輯值的列做爲id.vars
。同時會輸出警告信息。code
DT.m1=melt(DT,measure.vars = c("dob_child1","dob_child2","dob_child3"),variable.name = "child",value.name = "dob") DT.m1
## family_id age_mother child dob ## 1: 1 30 dob_child1 1998-11-26 ## 2: 2 27 dob_child1 1996-06-22 ## 3: 3 26 dob_child1 2002-07-11 ## 4: 4 32 dob_child1 2004-10-10 ## 5: 5 29 dob_child1 2000-12-05 ## 6: 1 30 dob_child2 2000-01-29 ## 7: 2 27 dob_child2 NA ## 8: 3 26 dob_child2 2004-04-05 ## 9: 4 32 dob_child2 2009-08-27 ## 10: 5 29 dob_child2 2005-02-28 ## 11: 1 30 dob_child3 NA ## 12: 2 27 dob_child3 NA ## 13: 3 26 dob_child3 2007-09-02 ## 14: 4 32 dob_child3 2012-07-21 ## 15: 5 29 dob_child3 NA
dcast
將數據從長邊短。ci
dcast(DT.m1,family_id+age_mother~ child,value.var = "dob")
## family_id age_mother dob_child1 dob_child2 dob_child3 ## 1: 1 30 1998-11-26 2000-01-29 NA ## 2: 2 27 1996-06-22 NA NA ## 3: 3 26 2002-07-11 2004-04-05 2007-09-02 ## 4: 4 32 2004-10-10 2009-08-27 2012-07-21 ## 5: 5 29 2000-12-05 2005-02-28 NA
dcast
使用公式界面。 value.var
說明列將會變成寬格式。如何知道每一個家庭小孩子的數量呢?io
dcast(DT.m1,family_id~.,fun.aggregate = function(x)sum(!is.na(x)),value.var = "dob")
## family_id . ## 1: 1 2 ## 2: 2 1 ## 3: 3 3 ## 4: 4 3 ## 5: 5 2
參考文獻: Efficient reshaping using data.tablestable