用R語言的quantreg包進行分位數迴歸

時間 2019-12-09

標籤語言 quantreg 包進位數迴歸简体版

原文原文鏈接

什麼是分位數迴歸

分位數迴歸(Quantile Regression)是計量經濟學的研究前沿方向之一，它利用解釋變量的多個分位數（例如四分位、十分位、百分位等）來獲得被解釋變量的條件分佈的相應的分位數方程。html

與傳統的OLS只獲得均值方程相比，分位數迴歸能夠更詳細地描述變量的統計分佈。它是給定迴歸變量X，估計響應變量Y條件分位數的一個基本方法；它不只能夠度量回歸變量在分佈中心的影響,並且還能夠度量在分佈上尾和下尾的影響，所以較之經典的最小二乘迴歸具備獨特的優點。衆所周知，經典的最小二乘迴歸是針對因變量的均值（指望）的：模型反映了因變量的均值怎樣受自變量的影響，
web

\(y=\beta X+\epsilon\)， \(E(y)=\beta X\)
分位數迴歸的核心思想就是從均值推廣到分位數。最小二乘迴歸的目標是最小化偏差平方和，分位數迴歸也是最小化一個新的目標函數：
\(\min_{\xi \in \mathcal{R}} \sum \rho_{\tau}(y_i-\xi)\)

quantreg包

quantreg即：Quantile Regression，擁有條件分位數模型的估計和推斷方法，包括線性、非線性和非參模型；處理單變量響應的條件分位數方法；處理刪失數據的幾種方法。此外，還包括基於預期風險損失的投資組合選擇方法。dom

實例

library(quantreg)  # 載入quantreg包
data(engel)        # 加載quantreg包自帶的數據集

分位數迴歸(tau = 0.5)
fit1 = rq(foodexp ~ income, tau = 0.5, data = engel)         
r1 = resid(fit1)   # 獲得殘差序列，並賦值爲變量r1
c1 = coef(fit1)    # 獲得模型的係數，並賦值給變量c1

summary(fit1)      # 顯示分位數迴歸的模型和係數
`
Call: rq(formula = foodexp ~ income, tau = 0.5, data = engel)

tau: [1] 0.5

Coefficients:
            coefficients lower bd  upper bd 
(Intercept)  81.48225     53.25915 114.01156
income        0.56018      0.48702   0.60199
`

summary(fit1, se = "boot") # 經過設置參數se，能夠獲得係數的假設檢驗
`
Call: rq(formula = foodexp ~ income, tau = 0.5, data = engel)

tau: [1] 0.5

Coefficients:
            Value    Std. Error t value  Pr(>|t|)
(Intercept) 81.48225 27.57092    2.95537  0.00344
income       0.56018  0.03507   15.97392  0.00000
`

分位數迴歸(tau = 0.五、0.75)
fit1 = rq(foodexp ~ income, tau = 0.5, data = engel) 
fit2 = rq(foodexp ~ income, tau = 0.75, data = engel)

模型比較
anova(fit1,fit2)    #方差分析
`   
Quantile Regression Analysis of Deviance Table

Model: foodexp ~ income
Joint Test of Equality of Slopes: tau in {  0.5 0.75  }

  Df Resid Df F value    Pr(>F)    
1  1      469  12.093 0.0005532 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
`   
畫圖比較分析
plot(engel$foodexp , engel$income,pch=20, col = "#2E8B57",
     main = "家庭收入與食品支出的分位數迴歸",xlab="食品支出",ylab="家庭收入")
lines(fitted(fit1), engel$income,lwd=2, col = "#EEEE00")
lines(fitted(fit2), engel$income,lwd=2, col = "#EE6363")
legend("topright", c("tau=.5","tau=.75"), lty=c(1,1),
       col=c("#EEEE00","#EE6363"))

不一樣分位點的迴歸比較
fit = rq(foodexp ~ income,  tau = c(0.05,0.25,0.5,0.75,0.95), data = engel)
plot( summary(fit))

多元分位數迴歸

data(barro)

fit1 <- rq(y.net ~ lgdp2 + fse2 + gedy2 + Iy2 + gcony2, data = barro,tau=.25)
fit2 <- rq(y.net ~ lgdp2 + fse2 + gedy2 + Iy2 + gcony2, data = barro,tau=.50)
fit3 <- rq(y.net ~ lgdp2 + fse2 + gedy2 + Iy2 + gcony2, data = barro,tau=.75)
# 替代方式 fit <- rq(y.net ~ lgdp2 + fse2 + gedy2 + Iy2 + gcony2, method = "fn", tau = 1:4/5, data = barro)

anova(fit1,fit2,fit3)             # 不一樣分位點模型比較-方差分析
anova(fit1,fit2,fit3,joint=FALSE)
`
Quantile Regression Analysis of Deviance Table

Model: y.net ~ lgdp2 + fse2 + gedy2 + Iy2 + gcony2
Tests of Equality of Distinct Slopes: tau in {  0.25 0.5 0.75  }

       Df Resid Df F value  Pr(>F)  
lgdp2   2      481  1.0656 0.34535  
fse2    2      481  2.6398 0.07241 .
gedy2   2      481  0.7862 0.45614  
Iy2     2      481  0.0447 0.95632  
gcony2  2      481  0.0653 0.93675  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In summary.rq(x, se = se, covariance = TRUE) : 1 non-positive fis
`
不一樣分位點擬合曲線的比較
plot(barro$y.net,pch=20, col = "#2E8B57",
     main = "不一樣分位點擬合曲線的比較")
lines(fitted(fit1),lwd=2, col = "#FF00FF")
lines(fitted(fit2),lwd=2, col = "#EEEE00")
lines(fitted(fit3),lwd=2, col = "#EE6363")
legend("topright", c("tau=.25","tau=.50","tau=.75"), lty=c(1,1),
       col=c( "#FF00FF","#EEEE00","#EE6363"))

時間序列數據之動態線性分位數迴歸

library(zoo)
data("AirPassengers", package = "datasets")
ap <- log(AirPassengers)
fm <- dynrq(ap ~ trend(ap) + season(ap), tau = 1:4/5)
`
Dynamic quantile regression "matrix" data:
Start = 1949(1), End = 1960(12)
Call:
dynrq(formula = ap ~ trend(ap) + season(ap), tau = 1:4/5)

Coefficients:
                  tau= 0.2    tau= 0.4     tau= 0.6     tau= 0.8
(Intercept)    4.680165533  4.72442529  4.756389747  4.763636251
trend(ap)      0.122068032  0.11807467  0.120418846  0.122603451
season(ap)Feb -0.074408403 -0.02589716 -0.006661952 -0.013385535
season(ap)Mar  0.082349382  0.11526821  0.114939193  0.106390507
season(ap)Apr  0.062351869  0.07079315  0.063283042  0.066870808
season(ap)May  0.064763333  0.08453454  0.069344618  0.087566554
season(ap)Jun  0.195099116  0.19998275  0.194786890  0.192013960
season(ap)Jul  0.297796876  0.31034824  0.281698714  0.326054871
season(ap)Aug  0.287624540  0.30491687  0.290142727  0.275755490
season(ap)Sep  0.140938329  0.14399906  0.134373833  0.151793646
season(ap)Oct  0.002821207  0.01175582  0.013443965  0.002691383
season(ap)Nov -0.154101220 -0.12176290 -0.124004759 -0.136538575
season(ap)Dec -0.031548941 -0.01893221 -0.023048200 -0.019458814

Degrees of freedom: 144 total; 131 residual
`
sfm <- summary(fm)
plot(sfm)

不一樣分位點擬合曲線的比較
fm1 <- dynrq(ap ~ trend(ap) + season(ap), tau = .25)
fm2 <- dynrq(ap ~ trend(ap) + season(ap), tau = .50)
fm3 <- dynrq(ap ~ trend(ap) + season(ap), tau = .75)

plot(ap,cex = .5,lwd=2, col = "#EE2C2C",main = "時間序列分位數迴歸")
lines(fitted(fm1),lwd=2, col = "#1874CD")
lines(fitted(fm2),lwd=2, col = "#00CD00")
lines(fitted(fm3),lwd=2, col = "#CD00CD")
legend("topright", c("原始擬合","tau=.25","tau=.50","tau=.75"), lty=c(1,1),
       col=c( "#EE2C2C","#1874CD","#00CD00","#CD00CD"),cex = 0.65)

除了本文介紹的以上內容，quantreg包還包含殘差形態的檢驗、非線性分位數迴歸和半參數和非參數分位數迴歸等等，詳細參見：用R語言進行分位數迴歸-詹鵬（北京師範大學經濟管理學院）和Package ‘quantreg’。函數