【譯文】利用STAN作貝葉斯迴歸分析：Part 2 非正態迴歸

時間 2019-11-11

標籤譯文利用 stan 貝葉迴歸分析简体版

原文原文鏈接

【譯文】利用STAN作貝葉斯迴歸分析：Part 2 非正態迴歸

做者　Lionel Hertzogn

前一篇文章已經介紹了怎樣在R中調用STAN對正態數據進行貝葉斯迴歸。本文則將利用三個樣例來演示怎樣在R中利用STAN擬合非正態模型。
三個樣例各自是negative binomial迴歸（過離散的泊松數據）。gamma迴歸（右偏的連續數據）和beta-binomial迴歸（過離散的二項數據）。css

負二項迴歸

泊松分佈常用於計數數據建模，它若是了數據的方差等於均值。當方差比均值大的時候，咱們稱數據存在過離散並改用負二項分佈來建模。
現在若是咱們手頭有服從負二項分佈的響應變量y以及k個解釋變量X。改用方程形式表示即：git

y i \sim N B (μ i, ϕ)

E (y i) = μ i

V a r (y i) = μ i + μ 2 i / ϕ

l o g (μ i) = β 0 + β 1 * X 1 i + \dots + β k * X k i

負二項分佈共同擁有兩個參數： μ ，是必須爲正數的指望，所以咱們可以利用一個對數連接函數把線性模型（即解釋變量乘上回歸係數）映射到均值 μ （見第四個方程）。
ϕ 是過離散度參數。值越小意味着離散度越大，離泊松分佈越遠，當 ϕ 變得愈來愈大，負二項分佈看起來會愈來愈像泊松分佈。web

讓咱們仿真一些數據並用STAN來建模

# 載入包
library(arm) # 利用裏面的invlogit函數
library(emdbook) # 利用裏面的rbetabinom function函數
library(rstanarm) # 利用launch_shinystan函數
# 生成服從負二項分佈的仿真數據
# 解釋變量
N <- 100 # 樣本量
dat <- data.frame(x1 = runif(N, -2, 2), x2 = runif(N, -2, 2))
# 模型
X <- model.matrix( ~ x1*x2, dat)
K <- dim(X)[2] # 迴歸係數維度
# 迴歸的斜率
betas <- runif(K, -1, 1)
# 設定仿真數據的過離散度
ph i<- 5
# 生成響應變量
y_nb < -rnbinom(100, size = phi, mu = exp(X%*%betas))

# 擬合模型
m_nb<-stan(file = "neg_bin.stan", data = list(N=N, K=K, X=X, y=y_nb), pars = c("beta","phi","y_rep"))

# 利用shinystan診斷模型
launch_shinystan(m_nb)

Shinystan界面

上述代碼的最後一行命令會在你的瀏覽器中打開一個窗體，上面有一些選項供預計、診斷和探索模型。
一些選項超出了我有限的知識範圍（如對數後驗vs樣本階梯大小），因此我一般關注迴歸係數的後驗分佈（Diagnose -> NUTS (plots) -> By model parameter），出現的柱形圖應該比較接近正態。我也會關注後驗預測檢驗（Diagnose -> PPcheck -> Distribution of observed data vs replications）。y_rep的分佈應當與觀測數據一樣。瀏覽器

模型看起來不錯，現在咱們可以利用模型的迴歸係數畫出預測迴歸線以及對應的置信區間。

# 計算後驗預測值和對應的可信區間 
post <- as.array(m_nb) 
# 咱們來看看x2在x3的三個取值上的變化性
new_X <- model.matrix( ~ x1*x2, expand.grid(x2=seq(-2, 2, length=10), x1=c(min(dat$x1), mean(dat$x1), max(dat$x1))))
# 計算每個樣本的預測值
pred <- apply(post[, , 1:4], c(1, 2), FUN = function(x) new_X%*%x)
# 每條鏈會存在不一樣的矩陣，並將信息重組存於一個矩陣內
dim(pred) <- c(30, 4000)
# 提取預測中位數和95%可信區間
pred_int <- apply(pred, 1, quantile, probs=c(0.025, 0.5, 0.975))

# 繪圖
plot(dat$x2, y_nb, pch=16)
lines(new_X[1:10,3], exp(pred_int[2,1:10]), col="orange", lwd=5)
lines(new_X[1:10,3], exp(pred_int[1,1:10]), col="orange", lwd=3, lty=2)
lines(new_X[1:10,3], exp(pred_int[3,1:10]), col="orange", lwd=3, lty=2)
lines(new_X[1:10,3], exp(pred_int[2,11:20]), col="red", lwd=5)
lines(new_X[1:10,3], exp(pred_int[1,11:20]), col="red", lwd=3, lty=2)
lines(new_X[1:10,3], exp(pred_int[3,11:20]), col="red", lwd=3, lty=2)
lines(new_X[1:10,3], exp(pred_int[2,21:30]), col="blue", lwd=5)
lines(new_X[1:10,3], exp(pred_int[1,21:30]), col="blue", lwd=3, lty=2)
lines(new_X[1:10,3], exp(pred_int[3,21:30]), col="blue", lwd=3, lty=2)
legend("topleft", legend=c("Min", "Mean", "Max"), ncol=3, col = c("orange", "red", "blue"), lwd = 3, bty="n", title="Value of x1")

輸出結果例如如下：

一如既往，咱們應當關注這類模型連接空間和響應空間的差別。
上述模型在連接空間裏作出了預測，若是咱們想把結果繪製於響應空間，咱們要利用連接函數的反函數（本例中是指數函數）來將預測值變換回去。markdown

Gamma分佈

有時咱們收集到的數據呈現出右偏的特色，比方體型、植物生物量等數據。
這一類數據可以利用對數正態分佈（將響應變量作對數變換）或者gamma分佈建模。此處我將利用gamma分佈擬合數據，若是咱們有服從gamma分佈的對應變量y和一些解釋變量X，方程形式例如如下：app

y i \sim G a m m a (α, β)

E (y i) = α / β

V a r (y i) = α / β 2

Mmmmmmm（譯者注：語氣詞，做者賣萌）與負二項分佈不一樣。咱們不能直接利用連接函數把線性模型映射到gamma分佈的某一個參數上（比方 μ ）。因此咱們需要利用一些微積分的手段來改變模型的參數形式：

E (y i) = α / β = μ

V a r (y i) = α / β 2 = ϕ

整理可得：

α = μ 2 / ϕ

β = μ / ϕ

上式中 μ 是分佈的指望, ϕ 是離散度參數,形式上和先前的負二項分佈一致.因爲 α 和 β 必須是正數,咱們可以再次在線性模型上套一個對數連接函數:

l o g (μ i) = β 0 + β 1 * X 1 i + \dots + β k * X k i

讓咱們生成一些數據來擬合這個模型：

# 生成服從gamma分佈的數據
mus <- exp(X%*%betas)
y_g <- rgamma(100, shape = mus**2/phi, rate = mus/phi)

# 模型
m_g <- stan(file = "gamma.stan", data = list(N=N, K=K, X=X, y=y_g), pars = c("betas","phi","y_rep"))

# 模型檢驗
launch_shinystan(m_g)

咱們再一次確認了模型是正確的,頁面上的所有結果看上去都很棒.因爲咱們利用了和前一個樣例同樣的連接函數，咱們僅僅需要把上述代的m_nb改成m_g就可以將結果可視化。

Beta Binomial分佈

最後，當咱們從一個肯定次數的試驗中收集到了一些數據（比方一次擲10次硬幣。每次正面朝上的比例），響應變量通常會服從二項分佈。現在咱們可以獲得大量的這類數據，並且就像泊松數據可能會過離散。二項分佈數據也會遇到類似狀況。此時，咱們就需要使用Beta-Binomial模型：

y i \sim B e t a B i n o m i a l (N, α, β)

E (y i) = N * α / (α + β)

把方差表達式先放在一邊(點擊這裏看看這公式多醜)，常數N(試驗次數)也可以不管。和gamma分佈的樣例同樣，咱們可以將方程整理爲別的形式，新方程的指望爲 μ 。離散參數爲 ϕ :

α = μ * ϕ

β = (1 - μ) * ϕ

此次 μ 表明機率於是必須落在0。1之間。咱們可以利用logit連接函數將線性買模型映射到 μ

l o g i t (μ) = β 0 + β 1 * X 1 i + \dots + β k * X k i

再來見證仿真的威力吧：

# 生成Beta-Binomial數據
W <- rep(20, 100) # 試驗次數
y_bb <- rbetabinom(100, prob = invlogit(X%*%betas), size = W, theta = phi)

# 模型
m_bb <- stan(file = "beta_bin.stan", data = list(N=N, W=W, K=K, X=X, y=y_bb), pars=c("betas", "phi", "y_rep"))

# 模型檢驗
launch_shinystan(m_bb)

一切看起來如此美妙。讓咱們把結果畫出來：

# 計算後驗預測值和置信區間 
post<-as.array(m_bb) # 後驗抽樣
# 後驗預測值
pred<-apply(post[,,1:4],c(1,2),FUN = function(x) new_X%*%x)
# 每條鏈都存在重組信息的不一樣矩陣裏
dim(pred)<-c(30,4000)
# 獲得後驗越策值的中位數和95%置信區間
pred_int<-apply(pred,1,quantile,probs=c(0.025,0.5,0.975))

# 繪圖
plot(dat$x2, y_bb, pch = 16)
lines(new_X[1:10,3], 20*invlogit(pred_int[2,1:10]), col="orange", lwd=5)
lines(new_X[1:10,3], 20*invlogit(pred_int[1,1:10]), col="orange", lwd=3, lty=2)
lines(new_X[1:10,3], 20*invlogit(pred_int[3,1:10]), col="orange", lwd=3, lty=2)
lines(new_X[1:10,3], 20*invlogit(pred_int[2,11:20]), col="red", lwd=5)
lines(new_X[1:10,3], 20*invlogit(pred_int[1,11:20]), col="red", lwd=3, lty=2)
lines(new_X[1:10,3], 20*invlogit(pred_int[3,11:20]), col="red", lwd=3, lty=2)
lines(new_X[1:10,3], 20*invlogit(pred_int[2,21:30]), col="blue", lwd=5)
lines(new_X[1:10,3], 20*invlogit(pred_int[1,21:30]), col="blue", lwd=3, lty=2)
lines(new_X[1:10,3], 20*invlogit(pred_int[3,21:30]), col="blue", lwd=3, lty=2)
legend("topleft", legend = c("Min", "Mean", "Max"), ncol=3, col = c("orange", "red", "blue"), lwd = 3, bty="n", title="Value of x1")

輸出例如如下：

注意exp函數要改成invlogit函數，因爲這裏用的連接函數不一樣。

零星想法

經過本文咱們學習了怎樣擬合非正態模型。
STAN是一個很靈活的工具，可以擬合許不一樣分佈和多類參數形式（參看參考手冊），其可能性僅僅受到你的若是的限制（也許還有你的數學水平。。。）。在這裏我聲明一下，rsranarm包可以讓你在不寫出模型形式的狀況下擬合STAN模型。僅僅需要寫一些諸如glm函數的典型的R代碼（請參考此文）。那麼，咱們還需要學習STAN嗎？這取決於你的目的，若是你的模型比較常用。那麼利用rstanarm包可以節約大量時間，讓你把精力集中於科研。並且rstanarm包內已有的函數執行速度很快。但若是有一天你認爲你會在模型上有所突破，需要擬合一個個性化的模型，那麼STAN會是一種很靈活和高效的建模語言。函數

模型代碼

/* *簡單負二項迴歸樣例 *使用負二項迴歸的另一種參數形式 *細節參看Stan參考手冊 section 40.1-3 */

data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  int y[N]; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] beta; //the regression parameters
  real phi; //the overdispersion parameters
}
transformed parameters {
  vector[N] mu;//the linear predictor
  mu <- exp(X*beta); //using the log link 
}
model {  
  beta[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008

  for(i in 2:K)
   beta[i] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008

  y ~ neg_binomial_2(mu,phi);
}
generated quantities {
 vector[N] y_rep;
 for(n in 1:N){
  y_rep[n] <- neg_binomial_2_rng(mu[n],phi); //posterior draws to get posterior predictive checks
 }
}

Gamma

/* *簡單gamma分佈樣例 *注意我使用了對數連接函數，大多數時候它比典型反向連接更有意義 */

data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  real y[N]; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] betas; //the regression parameters
  real phi; //the variance parameter
}
transformed parameters {
  vector[N] mu; //the expected values (linear predictor)
  vector[N] alpha; //shape parameter for the gamma distribution
  vector[N] beta; //rate parameter for the gamma distribution

  mu <- exp(X*betas); //using the log link 
  alpha <- mu .* mu / phi; 
  beta <- mu / phi;
}
model {  
  betas[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008

  for(i in 2:K)
   betas[i] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008

  y ~ gamma(alpha,beta);
}
generated quantities {
 vector[N] y_rep;
 for(n in 1:N){
  y_rep[n] <- gamma_rng(alpha[n],beta[n]); //posterior draws to get posterior predictive checks
 }
}

Beta-Binomial

/* *簡單beta-binomial樣例 */

data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  int y[N]; //the response
  matrix[N,K] X; //the model matrix
  int W[N]; //the number of trials per observations, ie a vector of 1 for a 0/1 dataset
}
parameters {
  vector[K] betas; //the regression parameters
  real phi; //the overdispersion parameter
}
transformed parameters {
  vector[N] mu; //the linear predictor
  vector[N] alpha; //the first shape parameter for the beta distribution
  vector[N] beta; //the second shape parameter for the beta distribution

  for(n in 1:N)
   mu[n] <- inv_logit(X[n,]*betas); //using logit link
  alpha <- mu * phi;
  beta <- (1-mu) * phi;
}
model {  
  betas[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008

  for(i in 2:K)
   betas[i] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008

  y ~ beta_binomial(W,alpha,beta);
}
generated quantities {
 vector[N] y_rep;
 for(n in 1:N){
  y_rep[n] <- beta_binomial_rng(W[n],alpha[n],beta[n]); //posterior draws to get posterior predictive checks
 }
}