方差分析 | ANOVA | 原理 | R代碼 | 進階 | one way and two way | Analysis of Variance

問題:dom

  • 爲何組間方差加組內方差必定等於總方差?如何從數學上理解。PPT裏有證實,引入一箇中間項就行。
  • 方差分析、協方差分析和迴歸分析有什麼聯繫?
  • 什麼是F分佈?Fisher的首創,理解不了F分佈就不可能真正理解方差分析。
  • 方差分析,就是要分析方差的來源!

咱們把組內方差看作是隨機偏差;組間差組成比較複雜:包含了隨機偏差、系統偏差。 spa

PPT:方差分析----單因素方差分析code

回憶一下:orm

卡方分佈就是多個標準正態分佈變量平方的和,自由度是其惟一的參數。(爲何當自由度爲3時,卡方分佈的形狀就變了,和三體問題有關嗎?)視頻

F分佈就是兩個不一樣卡方分佈的比的分佈,自由度是其惟一的參數(兩個自由度而已)。blog

方差分析假設隨機偏差是服從正態分佈的,那麼咱們假設組內和組間無差別,很天然就轉換到了F分佈。ci

那就連t分佈一塊兒回顧吧!t就是學生的意思,著名的t-SNE也是基於t分佈的,t分佈和正態分佈形狀基本是同樣的,當t分佈惟一的參數自由度大於30時,t分佈就趨近於正態分佈了。普通的z分佈底下除的是整體標準差,t分佈底下除的是樣本標準差。t分佈的自由度就是抽樣分佈中的sample size,根據中心極限定理,sample size越大,抽樣分佈的均值就越趨近於正態分佈。【YouTube上有個視頻講得很是清楚】get

原理

比較兩組(小樣本)就用t-test,比較三組及以上就用ANOVA。注意:咱們默認說的都是one way ANOVA,也就是對group的分類標準只有一個,好比case和control(ABCD多組),two way就是分類標準有多個,好比case or control,male or femal。數學

方差分析的核心原理:組內方差和組間方差是否有明顯的差別,用的F統計量,F分佈有兩個參數,也就是兩個自由度參數。it

方差分析會給一個總的顯著性結果,及組內和組間是否有顯著差別。顯著了須要再作兩兩比較。

 

R實例

One-Way ANOVA Test in R

my_data <- PlantGrowth

# Show a random sample
set.seed(1234)
dplyr::sample_n(my_data, 10)

# Show the levels
levels(my_data$group)

my_data$group <- ordered(my_data$group,
                         levels = c("ctrl", "trt1", "trt2"))

library(dplyr)
group_by(my_data, group) %>%
  summarise(
    count = n(),
    mean = mean(weight, na.rm = TRUE),
    sd = sd(weight, na.rm = TRUE)
  )

# Box plots
# ++++++++++++++++++++
# Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "group", y = "weight", 
          color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          order = c("ctrl", "trt1", "trt2"),
          ylab = "Weight", xlab = "Treatment")

# Mean plots
# ++++++++++++++++++++
# Plot weight by group
# Add error bars: mean_se
# (other values include: mean_sd, mean_ci, median_iqr, ....)
library("ggpubr")
ggline(my_data, x = "group", y = "weight", 
       add = c("mean_se", "jitter"), 
       order = c("ctrl", "trt1", "trt2"),
       ylab = "Weight", xlab = "Treatment")

 # Box plot
boxplot(weight ~ group, data = my_data,
        xlab = "Treatment", ylab = "Weight",
        frame = FALSE, col = c("#00AFBB", "#E7B800", "#FC4E07"))
# plotmeans
library("gplots")
plotmeans(weight ~ group, data = my_data, frame = FALSE,
          xlab = "Treatment", ylab = "Weight",
          main="Mean Plot with 95% CI") 
          
# Compute the analysis of variance
res.aov <- aov(weight ~ group, data = my_data)
# Summary of the analysis summary(res.aov)
# In one-way ANOVA test, a significant p-value indicates that some of the group means are different,
# but we don’t know which pairs of groups are different. TukeyHSD(res.aov)

 

進階

HSD

general linear hypothesis tests

repalce by Pairewise t-test under BH adjust

test validity

One-Way vs Two-Way ANOVA: Differences, Assumptions and Hypotheses

相關文章
相關標籤/搜索