基於邏輯迴歸模型,咱們能夠去預算市場將來的走勢。segmentfault
示例代碼大約有94%的正確率。app
要說的都在代碼註釋裏面。dom
cat("\014") # 載入示例股票 library(quantmod) getSymbols("^DJI", src = "yahoo") dji <- DJI[, "DJI.Close"] # 生成技術指標 avg10 <- rollapply(dji, 10, mean) avg20 <- rollapply(dji, 20, mean) std10 <- rollapply(dji, 10, sd) std20 <- rollapply(dji, 20, sd) rsi5 <- RSI(dji, 5, "SMA") rsi14 <- RSI(dji, 14, "SMA") macd12269 <- MACD(dji, 12, 26, 9, "SMA") macd7205 <- MACD(dji, 7, 20, 5, "SMA") bbands <- BBands(dji, 20, "SMA", 2) # 生成市場方向,收盤價與以後20天價格比較 direction <- NULL direction[dji > Lag(dji, 20)] <- 1 direction[dji < Lag(dji, 20)] <- 0 # 合併結果 dji <- cbind(dji, avg10, avg20, std10, std20, rsi5, rsi14, macd12269, macd7205, bbands, direction) dm <- dim(dji) dm colnames(dji)[dm[2]] <- "Direction" colnames(dji)[dm[2]] # 樣本內is和樣本外os issd <- "2010-01-01" ised <- "2014-12-31" ossd <- "2015-01-01" osed <- "2015-12-31" isrow <- which(index(dji) >= issd & index(dji) <= ised) osrow <- which(index(dji) >= ossd & index(dji) <= osed) isdji <- dji[isrow,] osdji <- dji[osrow,] # 數據標準化轉化 isme <- apply(isdji, 2, mean, na.rm = TRUE) isstd <- apply(isdji, 2, sd, na.rm = TRUE) isidn <- matrix(1, dim(isdji)[1], dim(isdji)[2]) norm_isdji <- (isdji - t(isme * t(isidn))) / t(isstd * t(isidn)) dm <- dim(isdji) norm_isdji[, dm[2]] <- direction[isrow] # 建模 formula <- as.formula("Direction ~ .") model <- glm(formula, family = "binomial", data = norm_isdji) summary(model) pred <- predict(model, norm_isdji) prob <- 1 / (1 + exp(-pred)) # 擬合效果和機率值 # par(mfrow = c(2, 1)) # 仍是這個問題:Error in plot.new() : figure margins too large plot(pred, type = "l") plot(prob, type = "l") pred_direction <- NULL pred_direction[prob > 0.5] <- 1 pred_direction[prob <= 0.5] <- 0 # 模型預測正確率 library(caret) ismatrix <- confusionMatrix(as.factor(pred_direction), as.factor(norm_isdji$Direction)) ismatrix # 樣本外數據測試泛化性能 osidn <- matrix(1, dim(osdji)[1], dim(osdji)[2]) norm_osdji <- (osdji - t(isme * t(osidn))) / t(isstd * t(osidn)) norm_osdji[, dm[2]] <- direction[osrow] ospred <- predict(model, norm_osdji) osprob <- 1 / (1 + exp(-ospred)) ospred_direction <- NULL ospred_direction[osprob > 0.5] <- 1 ospred_direction[osprob <= 0.5] <- 0 osmatrix <- confusionMatrix(as.factor(ospred_direction), as.factor(norm_osdji$Direction)) osmatrix
結果性能
模型概況測試
> summary(model) Call: glm(formula = formula, family = "binomial", data = norm_isdji) Deviance Residuals: Min 1Q Median 3Q Max -3.0080 -0.0107 0.0366 0.1533 3.1790 Coefficients: (3 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) 1.658760 0.222691 7.449 9.43e-14 *** DJI.Close 44.051359 8.409499 5.238 1.62e-07 *** DJI.Close.1 -44.561952 17.549358 -2.539 0.0111 * DJI.Close.2 0.577137 17.620013 0.033 0.9739 DJI.Close.3 -0.003556 0.291865 -0.012 0.9903 DJI.Close.4 -0.264309 0.312768 -0.845 0.3981 rsi 0.046117 0.339620 0.136 0.8920 rsi.1 -2.306590 0.565594 -4.078 4.54e-05 *** macd 2.562233 1.300929 1.970 0.0489 * signal 1.476838 0.610356 2.420 0.0155 * macd.1 -1.032963 0.798086 -1.294 0.1956 signal.1 3.871052 1.635221 2.367 0.0179 * dn NA NA NA NA mavg NA NA NA NA up NA NA NA NA pctB 1.269642 0.521006 2.437 0.0148 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1579.37 on 1257 degrees of freedom Residual deviance: 348.17 on 1245 degrees of freedom AIC: 374.17 Number of Fisher Scoring iterations: 8
模型擬合效果spa
機率code
樣本內數據正確率93.88%orm
> ismatrix Confusion Matrix and Statistics Reference Prediction 0 1 0 362 35 1 42 819 Accuracy : 0.9388 95% CI : (0.9241, 0.9514) No Information Rate : 0.6789 P-Value [Acc > NIR] : <2e-16 Kappa : 0.859 Mcnemar's Test P-Value : 0.4941 Sensitivity : 0.8960 Specificity : 0.9590 Pos Pred Value : 0.9118 Neg Pred Value : 0.9512 Prevalence : 0.3211 Detection Rate : 0.2878 Detection Prevalence : 0.3156 Balanced Accuracy : 0.9275 'Positive' Class : 0
樣本外數據正確率84.92%blog
> osmatrix Confusion Matrix and Statistics Reference Prediction 0 1 0 115 26 1 12 99 Accuracy : 0.8492 95% CI : (0.7989, 0.891) No Information Rate : 0.504 P-Value [Acc > NIR] : < 2e-16 Kappa : 0.6981 Mcnemar's Test P-Value : 0.03496 Sensitivity : 0.9055 Specificity : 0.7920 Pos Pred Value : 0.8156 Neg Pred Value : 0.8919 Prevalence : 0.5040 Detection Rate : 0.4563 Detection Prevalence : 0.5595 Balanced Accuracy : 0.8488 'Positive' Class : 0
感謝閱讀,歡迎關注和留言 量化投資與期貨外匯散仙,基金保險水平也拿的出手