R語言：經常使用統計檢驗

時間 2019-12-08

標籤語言經常使用統計檢驗简体版

原文原文鏈接

統計檢驗是將抽樣結果和抽樣分佈相對照而做出判斷的工做。主要分5個步驟：html

創建假設

求抽樣分佈

選擇顯著性水平和否認域

計算檢驗統計量

斷定 —— 百度百科

假設檢驗(hypothesis test)亦稱顯著性檢驗(significant test)，是統計推斷的另外一重要內容，其目的是比較整體參數之間有無差異。假設檢驗的實質是判斷觀察到的「差異」是由抽樣偏差引發仍是整體上的不一樣，目的是評價兩種不一樣處理引發效應不一樣的證據有多強，這種證據的強度用機率P來度量和表示。除t分佈外，針對不一樣的資料還有其餘各類檢驗統計量及分佈，如F分佈、X2分佈等，應用這些分佈對不一樣類型的數據進行假設檢驗的步驟相同，其差異僅僅是須要計算的檢驗統計量不一樣。less

正態整體均值的假設檢驗

t檢驗

t.test() => Student's t-Testide

require(graphics)

t.test(1:10, y = c(7:20))      # P = .00001855
t.test(1:10, y = c(7:20, 200)) # P = .1245    -- 不在顯著

## 經典案例: 學生犯困數據
plot(extra ~ group, data = sleep)

## 傳統表達式
with(sleep, t.test(extra[group == 1], extra[group == 2]))

    Welch Two Sample t-test

data:  extra[group == 1] and extra[group == 2]
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean of x mean of y 
     0.75      2.33 

## 公式形式
t.test(extra ~ group, data = sleep)

    Welch Two Sample t-test

data:  extra by group
t = -1.8608, df = 17.776, p-value = 0.07939
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.3654832  0.2054832
sample estimates:
mean in group 1 mean in group 2 
           0.75            2.33

單個整體ui

某種元件的壽命X（小時）服從正態分佈N（mu,sigma^2），其中mu、sigma^2均未知，16只元件的壽命以下；問是否有理由認爲元件的平均壽命大於255小時。

X<-c(159, 280, 101, 212, 224, 379, 179, 264,
222, 362, 168, 250, 149, 260, 485, 170)
t.test(X, alternative = "greater", mu = 225)

    One Sample t-test

data:  X
t = 0.66852, df = 15, p-value = 0.257
alternative hypothesis: true mean is greater than 225
95 percent confidence interval:
 198.2321      Inf
sample estimates:
mean of x 
    241.5

兩個整體url

X爲舊鍊鋼爐出爐率，Y爲新鍊鋼爐出爐率，問新的操做可否提升出爐率？

X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3)
Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1)
t.test(X, Y, var.equal=TRUE, alternative = "less")

    Two Sample t-test

data:  X and Y
t = -4.2957, df = 18, p-value = 0.0002176
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -1.908255
sample estimates:
mean of x mean of y 
    76.23     79.43

成對數據t檢驗code

對每一個高爐進行配對t檢驗

X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3)
Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1)
t.test(X-Y, alternative = "less")

    One Sample t-test

data:  X - Y
t = -4.2018, df = 9, p-value = 0.00115
alternative hypothesis: true mean is less than 0
95 percent confidence interval:
      -Inf -1.803943
sample estimates:
mean of x 
     -3.2

正態整體方差的假設檢驗

var.test() => F Test to Compare Two Variancesorm

x <- rnorm(50, mean = 0, sd = 2)
y <- rnorm(30, mean = 1, sd = 1)
var.test(x, y)                  # x和y的方差是否相同？
var.test(lm(x ~ 1), lm(y ~ 1))  # 相同.

從小學5年級男生中抽取20名，測量其身高（釐米）以下；問：在0.05顯著性水平下，平均值是否等於149，sigma^2是否等於75？

X<-scan()
136 144 143 157 137 159 135 158 147 165
158 142 159 150 156 152 140 149 148 155
var.test(X,Y)

    F test to compare two variances

data:  X and Y
F = 34.945, num df = 19, denom df = 9, p-value = 6.721e-06
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
   9.487287 100.643093
sample estimates:
ratio of variances 
          34.94489

對鍊鋼爐的數據進行分析

X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3)
Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1)
var.test(X,Y)

    F test to compare two variances

data:  X and Y
F = 1.4945, num df = 9, denom df = 9, p-value = 0.559
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.3712079 6.0167710
sample estimates:
ratio of variances 
          1.494481

二項分佈的整體檢驗

有一批蔬菜種子的平均發芽率爲P=0.85,如今隨機抽取500粒，用種衣劑進行浸種處理，結果有445粒發芽，問種衣劑有無效果。

binom.test(445,500,p=0.85)

    Exact binomial test

data:  445 and 500
number of successes = 445, number of trials = 500, p-value = 0.01207
alternative hypothesis: true probability of success is not equal to 0.85
95 percent confidence interval:
 0.8592342 0.9160509
sample estimates:
probability of success 
                  0.89

按照以往經驗，新生兒染色體異常率通常爲1%，某醫院觀察了當地400名新生兒，有一例染色體異常，問該地區新生兒染色體是否低於通常水平？

binom.test(1,400,p=0.01,alternative="less")

    Exact binomial test

data:  1 and 400
number of successes = 1, number of trials = 400, p-value = 0.09048
alternative hypothesis: true probability of success is less than 0.01
95 percent confidence interval:
 0.0000000 0.0118043
sample estimates:
probability of success 
                0.0025

非參數檢驗

數據是否正態分佈的Neyman-Pearson 擬合優度檢驗-chisq

5種品牌啤酒愛好者的人數以下
A 210
B 312
C 170
D 85
E 223
問不一樣品牌啤酒愛好者人數之間有沒有差別？

X<-c(210, 312, 170, 85, 223)
chisq.test(X)

    Chi-squared test for given probabilities

data:  X
X-squared = 136.49, df = 4, p-value < 2.2e-16

檢驗學生成績是否符合正態分佈

X<-scan()
25 45 50 54 55 61 64 68 72 75 75
78 79 81 83 84 84 84 85 86 86 86
87 89 89 89 90 91 91 92 100
A<-table(cut(X, br=c(0,69,79,89,100)))
#cut 將變量區域劃分爲若干區間
#table 計算因子合併後的個數

p<-pnorm(c(70,80,90,100), mean(X), sd(X))
p<-c(p[1], p[2]-p[1], p[3]-p[2], 1-p[3])
chisq.test(A,p=p)

    Chi-squared test for given probabilities

data:  A
X-squared = 8.334, df = 3, p-value = 0.03959
#均值之間有無顯著區別

大麥的雜交後代芒性狀的比例無芒：長芒：短芒=9：3：4,而實際觀測值爲335：125：160 ,檢驗觀測值是否符合理論假設？htm

chisq.test(c(335, 125, 160), p=c(9,3,4)/16)

    Chi-squared test for given probabilities

data:  c(335, 125, 160)
X-squared = 1.362, df = 2, p-value = 0.5061

現有42個數據，分別表示某一時間段內電話總機借到呼叫的次數，
接到呼叫的次數 0 1 2 3 4 5 6
出現的頻率 7 10 12 8 3 2 0
問：某個時間段內接到的呼叫次數是否符合Possion分佈？

x<-0:6
y<-c(7,10,12,8,3,2,0)
mean<-mean(rep(x,y))
q<-ppois(x,mean)
n<-length(y)
p[1]<-q[1]
p[n]<-1-q[n-1]
for(i in 2:(n-1))
  p[i]<-1-q[i-1]
chisq.test(y, p= rep(1/length(y), length(y)) )

    Chi-squared test for given probabilities

data:  y
X-squared = 19.667, df = 6, p-value = 0.003174

Z<-c(7, 10, 12, 8)
n<-length(Z); p<-p[1:n-1]; p[n]<-1-q[n-1]
chisq.test(Z, p= rep(1/length(Z), length(Z)))

Chi-squared test for given probabilities

data:  Z
X-squared = 1.5946, df = 3, p-value = 0.6606

P值越小越有理由拒絕無效假設，認爲整體之間有差異的統計學證據越充分。須要注意：不拒絕H0不等於支持H0成立，僅表示現有樣本信息不足以拒絕H0。
傳統上，一般將P＞0.05稱爲「不顯著」，0.0l<P≤0.05稱爲「顯著」，P≤0.0l稱爲「很是顯著」。blog

注：本文參考來自張金龍科學網博客。ci