方差分析 | ANOVA | 原理 | R代碼 | 進階 | one way and two way | Analysis of Variance

時間 2020-05-04

標籤方差分析 anova 原理代碼進階 way analysis variance 简体版

原文原文鏈接

問題：dom

爲何組間方差加組內方差必定等於總方差？如何從數學上理解。PPT裏有證實，引入一箇中間項就行。
方差分析、協方差分析和迴歸分析有什麼聯繫？
什麼是F分佈？Fisher的首創，理解不了F分佈就不可能真正理解方差分析。
方差分析，就是要分析方差的來源！

咱們把組內方差看作是隨機偏差；組間差組成比較複雜：包含了隨機偏差、系統偏差。 spa

PPT：方差分析----單因素方差分析code

回憶一下：orm

卡方分佈就是多個標準正態分佈變量平方的和，自由度是其惟一的參數。（爲何當自由度爲3時，卡方分佈的形狀就變了，和三體問題有關嗎？）視頻

F分佈就是兩個不一樣卡方分佈的比的分佈，自由度是其惟一的參數（兩個自由度而已）。blog

方差分析假設隨機偏差是服從正態分佈的，那麼咱們假設組內和組間無差別，很天然就轉換到了F分佈。ci

那就連t分佈一塊兒回顧吧！t就是學生的意思，著名的t-SNE也是基於t分佈的，t分佈和正態分佈形狀基本是同樣的，當t分佈惟一的參數自由度大於30時，t分佈就趨近於正態分佈了。普通的z分佈底下除的是整體標準差，t分佈底下除的是樣本標準差。t分佈的自由度就是抽樣分佈中的sample size，根據中心極限定理，sample size越大，抽樣分佈的均值就越趨近於正態分佈。【YouTube上有個視頻講得很是清楚】get

原理

比較兩組（小樣本）就用t-test，比較三組及以上就用ANOVA。注意：咱們默認說的都是one way ANOVA，也就是對group的分類標準只有一個，好比case和control（ABCD多組），two way就是分類標準有多個，好比case or control，male or femal。數學

方差分析的核心原理：組內方差和組間方差是否有明顯的差別，用的F統計量，F分佈有兩個參數，也就是兩個自由度參數。it

方差分析會給一個總的顯著性結果，及組內和組間是否有顯著差別。顯著了須要再作兩兩比較。

R實例

One-Way ANOVA Test in R

my_data <- PlantGrowth

# Show a random sample
set.seed(1234)
dplyr::sample_n(my_data, 10)

# Show the levels
levels(my_data$group)

my_data$group <- ordered(my_data$group,
                         levels = c("ctrl", "trt1", "trt2"))

library(dplyr)
group_by(my_data, group) %>%
  summarise(
    count = n(),
    mean = mean(weight, na.rm = TRUE),
    sd = sd(weight, na.rm = TRUE)
  )

# Box plots
# ++++++++++++++++++++
# Plot weight by group and color by group
library("ggpubr")
ggboxplot(my_data, x = "group", y = "weight", 
          color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          order = c("ctrl", "trt1", "trt2"),
          ylab = "Weight", xlab = "Treatment")

# Mean plots
# ++++++++++++++++++++
# Plot weight by group
# Add error bars: mean_se
# (other values include: mean_sd, mean_ci, median_iqr, ....)
library("ggpubr")
ggline(my_data, x = "group", y = "weight", 
       add = c("mean_se", "jitter"), 
       order = c("ctrl", "trt1", "trt2"),
       ylab = "Weight", xlab = "Treatment")

 # Box plot
boxplot(weight ~ group, data = my_data,
        xlab = "Treatment", ylab = "Weight",
        frame = FALSE, col = c("#00AFBB", "#E7B800", "#FC4E07"))
# plotmeans
library("gplots")
plotmeans(weight ~ group, data = my_data, frame = FALSE,
          xlab = "Treatment", ylab = "Weight",
          main="Mean Plot with 95% CI") 
          
# Compute the analysis of variance
res.aov <- aov(weight ~ group, data = my_data)

# Summary of the analysis
summary(res.aov)

# In one-way ANOVA test, a significant p-value indicates that some of the group means are different, 
# but we don’t know which pairs of groups are different.
TukeyHSD(res.aov)