http://www.jianshu.com/p/b7274afff14f?from=timelinehtml
上篇文章中提了一下如何經過ggpubr包爲ggplot
圖添加p-value
以及顯著性標記,本文將詳細介紹。利用數據集ToothGrowth進行演示git
#先加載包 library(ggpubr) #加載數據集ToothGrowth data("ToothGrowth") head(ToothGrowth)
## len supp dose ## 1 4.2 VC 0.5 ## 2 11.5 VC 0.5 ## 3 7.3 VC 0.5 ## 4 5.8 VC 0.5 ## 5 6.4 VC 0.5 ## 6 10.0 VC 0.5
R中經常使用的比較方法主要有下面幾種:github
方法 | R函數 | 描述 |
---|---|---|
T-test | t.test() | 比較兩組(參數) |
Wilcoxon test | wilcox.test() | 比較兩組(非參數) |
ANOVA | aov()或anova() | 比較多組(參數) |
Kruskal-Wallis | kruskal.test() | 比較多組(非參數) |
各類比較方法後續有時間一一講解。markdown
p-value
主要利用ggpubr包中的兩個函數:session
compare_means()
:能夠進行一組或多組間的比較stat_compare_mean()
:自動添加p-value
、顯著性標記到ggplot圖中該函數主要用用法以下:app
compare_means(formula, data, method = "wilcox.test", paired = FALSE, group.by = NULL, ref.group = NULL, ...)
註釋:ide
formula
:形如x~group
,其中x是數值型變量,group
是因子,能夠是一個或者多個data
:數據集method
:比較的方法,默認爲"wilcox.test"
, 其餘可選方法爲:"t.test"
、"anova"
、"kruskal.test"
paired
:是否要進行paired test
(TRUE
or FALSE
)group_by
: 比較時是否要進行分組ref.group
: 是否須要指定參考組主要用法:函數
stat_compare_means(mapping = NULL, comparisons = NULL hide.ns = FALSE, label = NULL, label.x = NULL, label.y = NULL, ...)
註釋:學習
mapping
:由aes()
建立的一套美學映射comparisons
:指定須要進行比較以及添加p-value
、顯著性標記的組hide.ns
:是否要顯示顯著性標記nslabel
:顯著性標記的類型,可選項爲:p.signif
(顯著性標記)、p.format
(顯示p-value
)label.x
、label.y
:顯著性標籤調整compare_means(len~supp, data=ToothGrowth)
結果解釋:測試
.y
:測試中使用的y變量p
:p-value
p.adj
:調整後的p-value
。默認爲p.adjust.method="holm"
p.format
:四捨五入後的p-value
p.signif
:顯著性水平method
:用於統計檢驗的方法 p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter")#添加p-valuep+stat_compare_means()
#使用其餘統計檢驗方法 p+stat_compare_means(method = "t.test")
label.x
、label.y
、hjust
及vjust
來調整aes()
映射來更改:
aes(label=..p.format..)
或aes(lebel=paste0("p=",..p.format..))
:只顯示p-value
,不顯示統計檢驗方法aes(label=..p.signif..)
:僅顯示顯著性水平aes(label=paste0(..method..,"\n", "p=",..p.format..))
:p-value
與顯著性水平分行顯示舉個栗子:
p+stat_compare_means(aes(label=..p.signif..), label.x = 1.5, label.y = 40)
也能夠將標籤指定爲字符向量,不要映射,只需將p.signif兩端的..去掉便可
p+stat_compare_means(label = "p.signif", label.x = 1.5, label.y = 40)
compare_means(len~supp, data=ToothGrowth, paired = TRUE)
ggpaired(ToothGrowth, x="supp", y="len", color = "supp", line.color = "gray", line.size = 0.4, palette = "jco")+ stat_compare_means(paired = TRUE)
Global test
compare_means(len~dose, data=ToothGrowth, method = "anova")
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means()
#使用其餘的方法 ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova")
Pairwise comparisons:若是分組變量中包含兩個以上的水平,那麼會自動進行pairwise test,默認方法爲"wilcox.test"
compare_means(len~dose, data=ToothGrowth)
#能夠指定比較哪些組 my_comparisons <- list(c("0.5", "1"), c("1", "2"), c("0.5", "2")) ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+ stat_compare_means(comparisons=my_comparisons)+ # Add pairwise comparisons p-value stat_compare_means(label.y = 50) # Add global p-value
能夠經過修改參數label.y來更改標籤的位置
ggboxplot(ToothGrowth, x="dose", y="len", color = "dose",palette = "jco")+ stat_compare_means(comparisons=my_comparisons, label.y = c(29, 35, 40))+ # Add pairwise comparisons p-value stat_compare_means(label.y = 45) # Add global p-value
至於經過添加線條來鏈接比較的兩組,這一功能已由包ggsignif實現
##設定參考組 compare_means(len~dose, data=ToothGrowth, ref.group = "0.5", #以dose=0.5組爲參考組 method = "t.test" )
#可視化 ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova", label.y = 40)+ # Add global p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = "0.5") # Pairwise comparison against reference
參考組也能夠設置爲.all.即全部的平均值
compare_means(len~dose, data=ToothGrowth, ref.group = ".all.", method = "t.test")
#可視化 ggboxplot(ToothGrowth, x="dose", y="len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova", label.y = 40)+# Add global p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")#Pairwise comparison against all
接下來利用survminer包中的數據集myeloma來說解一下爲何有時候咱們須要將ref.group設置爲.all.
library(survminer)#沒安裝的先安裝再加載 data("myeloma") head(myeloma)
咱們將根據患者的分組來繪製DEPDC1基因的表達譜,看不一樣組之間是否存在顯著性的差別,咱們能夠在7組之間進行比較,可是這樣的話組間比較的組合就太多了,所以咱們能夠將7組中每一組與所有平均值進行比較,看看DEPDC1基因在不一樣的組中是否過表達仍是低表達。
compare_means(DEPDC1~molecular_group, data = myeloma, ref.group = ".all.", method = "t.test")
#可視化DEPDC1基因表達譜 ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+ rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.")# Pairwise comparison against all
從圖中能夠看出,DEPDC1基因在Proliferation組中顯著性地過表達,而在Hyperdiploid和Low bone disease顯著性地低表達
咱們也能夠將非顯著性標記ns去掉,只須要將參數hide.ns=TRUE
ggboxplot(myeloma, x="molecular_group", y="DEPDC1", color = "molecular_group", add = "jitter", legend="none")+ rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype=2)+# Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.", hide.ns = TRUE)# Pairwise comparison against all
按另外一個變量進行分組以後進行統計檢驗,好比按變量dose進行分組:
compare_means(len~supp, data=ToothGrowth, group.by = "dose")
#可視化 p <- ggboxplot(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", add = "jitter", facet.by = "dose", short.panel.labs = FALSE)#按dose進行分面 #label只繪製 p-valuep+stat_compare_means(label = "p.format")
#label繪製顯著性水平 p+stat_compare_means(label = "p.signif", label.x = 1.5)
#將全部箱線圖繪製在一個panel中 p <- ggboxplot(ToothGrowth, x="dose", y="len", color = "supp", palette = "jco", add = "jitter") p+stat_compare_means(aes(group=supp))
#只顯示p-value p+stat_compare_means(aes(group=supp), label = "p.format")
#顯示顯著性水平 p+stat_compare_means(aes(group=supp), label = "p.signif")
進行paired sample檢驗 compare_means(len~supp, data=ToothGrowth, group.by = "dose", paired = TRUE)
#可視化 p <- ggpaired(ToothGrowth, x="supp", y="len", color = "supp", palette = "jco", line.color="gray", line.size=0.4, facet.by = "dose", short.panel.labs = FALSE)#按dose分面 #只顯示p-value p+stat_compare_means(label = "p.format", paired = TRUE)
#有偏差棒的條形圖,實際上我之前的文章裏有純粹用ggplot2實現 ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se")+ stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
#有偏差棒的線圖 ggline(ToothGrowth, x="dose", y="len", add = "mean_se")+ stat_compare_means()+ stat_compare_means(ref.group = "0.5", label = "p.signif", label.y = c(22, 29))
ggbarplot(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco", position = position_dodge(0.8))+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = 29)
ggline(ToothGrowth, x="dose", y="len", add = "mean_se", color = "supp", palette = "jco")+ stat_compare_means(aes(group=supp), label = "p.signif", label.y = c(16, 25, 29))
sessionInfo() ## R version 3.4.0 (2017-04-21) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 8.1 x64 (build 9600) ## ## Matrix products: default ## ## locale: ## [1] LC_COLLATE=Chinese (Simplified)_China.936 ## [2] LC_CTYPE=Chinese (Simplified)_China.936 ## [3] LC_MONETARY=Chinese (Simplified)_China.936 ## [4] LC_NUMERIC=C ## [5] LC_TIME=Chinese (Simplified)_China.936 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] survminer_0.4.0 ggpubr_0.1.3 magrittr_1.5 ggplot2_2.2.1 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_0.12.11 compiler_3.4.0 plyr_1.8.4 ## [4] tools_3.4.0 digest_0.6.12 evaluate_0.10 ## [7] tibble_1.3.3 gtable_0.2.0 nlme_3.1-131 ## [10] lattice_0.20-35 rlang_0.1.1 Matrix_1.2-10 ## [13] psych_1.7.5 ggsci_2.4 DBI_0.6-1 ## [16] cmprsk_2.2-7 yaml_2.1.14 parallel_3.4.0 ## [19] gridExtra_2.2.1 dplyr_0.5.0 stringr_1.2.0 ## [22] knitr_1.16 survMisc_0.5.4 rprojroot_1.2 ## [25] grid_3.4.0 data.table_1.10.4 KMsurv_0.1-5 ## [28] R6_2.2.1 km.ci_0.5-2 survival_2.41-3 ## [31] foreign_0.8-68 rmarkdown_1.5 reshape2_1.4.2 ## [34] tidyr_0.6.3 purrr_0.2.2.2 splines_3.4.0 ## [37] backports_1.1.0 scales_0.4.1 htmltools_0.3.6 ## [40] assertthat_0.2.0 mnormt_1.5-5 xtable_1.8-2 ## [43] colorspace_1.3-2 ggsignif_0.2.0 labeling_0.3 ## [46] stringi_1.1.5 lazyeval_0.2.0 munsell_0.4.3 ## [49] broom_0.4.2 zoo_1.8-0