1、獲取數據dom
> library(quantmod) > codedata<-list() > codepool<-c("MMM","AXP","AAPL","BA","CAT","CVX","CSCO","KO","DD","XOM","GE","GS","HD","IBM","INTC","JNJ","JPM","MCD","MRK","MSFT","NKE","PFE","PG","TRV","UTX","UNH","VZ","WMT","V","DIS") > number<-length(codepool) > for(i in 1:number){ + n<-codepool[i] + TESTdata<-get(getSymbols(n,src="yahoo",from=Sys.Date()-50,to=Sys.Date())) + codedata[[i]]<-as.data.frame(TESTdata[,4]) + next} > finaldata<-do.call(cbind,codedata) > names(finaldata)<-codepool > head(finaldata,3) MMM AXP AAPL BA CAT CVX CSCO KO DD XOM GE GS HD IBM INTC JNJ JPM MCD MRK MSFT NKE PFE PG TRV 2017-01-23 178.51 75.97 120.08 157.84 94.46 115.39 30.27 41.43 72.78 84.97 29.75 232.67 138.07 171.03 36.77 113.91 83.71 121.38 61.81 62.96 53.24 31.46 86.96 118.04 2017-01-24 175.97 77.43 119.97 160.55 96.24 116.37 30.60 41.90 76.05 85.09 30.00 233.68 138.06 175.90 37.62 111.76 84.72 121.05 61.21 63.52 53.45 31.15 87.86 116.84 2017-01-25 176.73 76.89 121.88 167.36 98.15 117.24 30.70 42.12 76.67 85.34 30.37 237.25 137.48 178.29 37.80 112.80 86.03 121.79 61.08 63.68 53.86 31.29 87.16 117.67 UTX UNH VZ WMT V DIS 2017-01-23 110.34 159.07 52.41 66.65 82.15 107.12 2017-01-24 111.61 160.43 50.12 67.40 83.23 107.90 2017-01-25 110.96 161.24 49.77 66.89 83.90 108.04 > tail(finaldata,3) MMM AXP AAPL BA CAT CVX CSCO KO DD XOM GE GS HD IBM INTC JNJ JPM MCD MRK MSFT NKE PFE PG TRV 2017-03-08 189.51 79.04 139.00 181.74 93.23 109.61 34.02 41.99 79.77 81.03 29.80 250.24 146.92 179.45 35.62 124.10 91.21 128.09 65.80 64.99 56.51 33.91 90.14 121.24 2017-03-09 189.90 79.30 138.68 180.57 91.39 110.04 34.07 42.03 80.48 81.67 29.66 250.18 146.62 177.18 35.82 125.95 91.57 128.14 65.89 64.73 56.36 34.05 90.34 121.90 2017-03-10 191.21 79.38 139.14 178.70 92.31 110.61 34.26 42.29 80.86 81.61 30.28 248.38 146.85 177.83 35.91 126.21 91.28 127.98 65.60 64.93 56.43 34.11 91.07 122.83 UTX UNH VZ WMT V DIS 2017-03-08 111.75 167.91 49.16 69.80 88.96 110.84 2017-03-09 111.93 168.01 49.28 69.86 89.11 111.03 2017-03-10 112.14 169.98 49.35 70.10 89.73 110.92
道瓊斯指數共有30只成分股,取各個成分股最近50天的收盤價。code
2、主成分分析component
> p<-princomp(finaldata[,1:30]) > summary(p) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Standard deviation 16.1041328 4.22688825 3.50222384 2.20689405 2.01370950 1.459120153 1.298793653 0.991546988 0.858445383 0.752024098 0.679270196 0.579182056 Proportion of Variance 0.8458697 0.05827339 0.04000521 0.01588517 0.01322581 0.006944019 0.005501855 0.003206678 0.002403555 0.001844558 0.001504922 0.001094105 Cumulative Proportion 0.8458697 0.90414310 0.94414831 0.96003348 0.97325929 0.980203314 0.985705168 0.988911847 0.991315402 0.993159960 0.994664882 0.995758987
主成分分析法生成了30個新變量,前面3個變量能夠解釋到原數據集94.41%(0.8458697+0.05827339+0.04000521)的方差,所以咱們能夠用這3個新的變量代替原來30個初始變量。orm
> stockdata<-as.data.frame(as.matrix(finaldata[,1:30])%*%as.matrix(p$loadings[,1:3])) > head(stockdata) Comp.1 Comp.2 Comp.3 2017-01-23 -497.0058 -3.717498 174.8456 2017-01-24 -498.6973 -5.271716 178.5616 2017-01-25 -505.3037 -4.955875 181.6700 2017-01-26 -507.5417 -4.710687 182.2582 2017-01-27 -506.5475 -3.575603 180.2852 2017-01-30 -502.8693 -3.291562 176.4937
3、迴歸分析ip
一、獲取道瓊斯指數最近50天數據,合併數據框ci
> res<-get(getSymbols("DJIA",src="yahoo",from=Sys.Date()-50,to=Sys.Date())) > US30<-as.data.frame(res[,4]) > mydata<-cbind(stockdata,US30) > names(mydata)<-c("X1","X2","X3","US30") > head(mydata) X1 X2 X3 US30 2017-01-23 -497.0058 -3.717498 174.8456 19799.85 2017-01-24 -498.6973 -5.271716 178.5616 19912.71 2017-01-25 -505.3037 -4.955875 181.6700 20068.51 2017-01-26 -507.5417 -4.710687 182.2582 20100.91 2017-01-27 -506.5475 -3.575603 180.2852 20093.78 2017-01-30 -502.8693 -3.291562 176.4937 19971.13
二、構造多元線性迴歸模型get
> model<-lm(US30~X1+X2+X3,mydata) > summary(model) Call: lm(formula = US30 ~ X1 + X2 + X3, data = mydata) Residuals: Min 1Q Median 3Q Max -50.267 -7.866 -1.937 10.018 39.555 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5395.8202 218.4604 24.699 < 2e-16 *** X1 -25.7840 0.2282 -113.006 < 2e-16 *** X2 0.4522 0.8693 0.520 0.607 X3 9.1254 1.0492 8.698 1.06e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 21.43 on 30 degrees of freedom Multiple R-squared: 0.9977, Adjusted R-squared: 0.9974 F-statistic: 4282 on 3 and 30 DF, p-value: < 2.2e-16 > model Call: lm(formula = US30 ~ X1 + X2 + X3, data = mydata) Coefficients: (Intercept) X1 X2 X3 5395.8202 -25.7840 0.4522 9.1254
檢驗顯示X2與US30不顯著,剔除X2。所以線性迴歸方程爲:io
> US30 = 5395.8202 + -25.7840*X1 + 9.1254*X3