問題:dom
咱們把組內方差看作是隨機偏差;組間差組成比較複雜:包含了隨機偏差、系統偏差。 spa
回憶一下:orm
卡方分佈就是多個標準正態分佈變量平方的和,自由度是其惟一的參數。(爲何當自由度爲3時,卡方分佈的形狀就變了,和三體問題有關嗎?)視頻
F分佈就是兩個不一樣卡方分佈的比的分佈,自由度是其惟一的參數(兩個自由度而已)。blog
方差分析假設隨機偏差是服從正態分佈的,那麼咱們假設組內和組間無差別,很天然就轉換到了F分佈。ci
那就連t分佈一塊兒回顧吧!t就是學生的意思,著名的t-SNE也是基於t分佈的,t分佈和正態分佈形狀基本是同樣的,當t分佈惟一的參數自由度大於30時,t分佈就趨近於正態分佈了。普通的z分佈底下除的是整體標準差,t分佈底下除的是樣本標準差。t分佈的自由度就是抽樣分佈中的sample size,根據中心極限定理,sample size越大,抽樣分佈的均值就越趨近於正態分佈。【YouTube上有個視頻講得很是清楚】get
比較兩組(小樣本)就用t-test,比較三組及以上就用ANOVA。注意:咱們默認說的都是one way ANOVA,也就是對group的分類標準只有一個,好比case和control(ABCD多組),two way就是分類標準有多個,好比case or control,male or femal。數學
方差分析的核心原理:組內方差和組間方差是否有明顯的差別,用的F統計量,F分佈有兩個參數,也就是兩個自由度參數。it
方差分析會給一個總的顯著性結果,及組內和組間是否有顯著差別。顯著了須要再作兩兩比較。
my_data <- PlantGrowth # Show a random sample set.seed(1234) dplyr::sample_n(my_data, 10) # Show the levels levels(my_data$group) my_data$group <- ordered(my_data$group, levels = c("ctrl", "trt1", "trt2")) library(dplyr) group_by(my_data, group) %>% summarise( count = n(), mean = mean(weight, na.rm = TRUE), sd = sd(weight, na.rm = TRUE) ) # Box plots # ++++++++++++++++++++ # Plot weight by group and color by group library("ggpubr") ggboxplot(my_data, x = "group", y = "weight", color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment") # Mean plots # ++++++++++++++++++++ # Plot weight by group # Add error bars: mean_se # (other values include: mean_sd, mean_ci, median_iqr, ....) library("ggpubr") ggline(my_data, x = "group", y = "weight", add = c("mean_se", "jitter"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment") # Box plot boxplot(weight ~ group, data = my_data, xlab = "Treatment", ylab = "Weight", frame = FALSE, col = c("#00AFBB", "#E7B800", "#FC4E07")) # plotmeans library("gplots") plotmeans(weight ~ group, data = my_data, frame = FALSE, xlab = "Treatment", ylab = "Weight", main="Mean Plot with 95% CI") # Compute the analysis of variance res.aov <- aov(weight ~ group, data = my_data)
# Summary of the analysis summary(res.aov)
# In one-way ANOVA test, a significant p-value indicates that some of the group means are different,
# but we don’t know which pairs of groups are different. TukeyHSD(res.aov)
HSD
general linear hypothesis tests
repalce by Pairewise t-test under BH adjust
test validity
One-Way vs Two-Way ANOVA: Differences, Assumptions and Hypotheses