投放媒體後計算每一個媒體廣告的效果app
#數據的獲取和清洗dom
#擬合模型函數
#計算貢獻量code
數據的獲取和清洗orm
#數據的清洗和轉換 > setwd('C:\\Users\\Xu\\Desktop\\data') > list.files() > library(openxlsx) > rawMix<-read.xlsx("MarketingMix.xlsx") > head(rawMix) Week_Date TV Search Display Print Social PR Sales 1 42156 0 0 0 0 34974 949 639003.9 2 42163 0 0 0 0 36194 947 589986.4 3 42170 0 0 0 0 37034 911 672482.5 4 42177 0 0 0 0 37232 793 611432.7 5 42184 0 0 0 0 27229 825 854615.0 6 42191 0 0 0 0 32979 899 537515.9 > str(rawMix) 'data.frame': 58 obs. of 8 variables: $ Week_Date: num 42156 42163 42170 42177 42184 ... #實現如期是 num 須要轉化爲 date $ TV : num 0 0 0 0 0 0 0 0 0 0 ... $ Search : num 0 0 0 0 0 0 0 0 0 0 ... $ Display : num 0 0 0 0 0 0 0 0 0 0 ... $ Print : num 0 0 0 0 0 ... $ Social : num 34974 36194 37034 37232 27229 ... $ PR : num 949 947 911 793 825 899 866 794 789 730 ... $ Sales : num 639004 589986 672483 611433 854615 ... > rawMix$weekNew<-as.Date(rawMix$Week_Date,origin = "1899-12-30") #日期進行轉換 > str(rawMix) 'data.frame': 58 obs. of 9 variables: $ Week_Date: num 42156 42163 42170 42177 42184 ... $ TV : num 0 0 0 0 0 0 0 0 0 0 ... $ Search : num 0 0 0 0 0 0 0 0 0 0 ... $ Display : num 0 0 0 0 0 0 0 0 0 0 ... $ Print : num 0 0 0 0 0 ... $ Social : num 34974 36194 37034 37232 27229 ... $ PR : num 949 947 911 793 825 899 866 794 789 730 ... $ Sales : num 639004 589986 672483 611433 854615 ... $ weekNew : Date, format: "2015-06-01" "2015-06-08" "2015-06-15" "2015-06-22" ...
數據探索ip
#數據的探索 #目的是爲了對銷售量的影響,看下變量與銷售的相關性和變量自身的相關性(多重共線性問題) > cor(rawMix[,2:8])#拿2-8的數據 TV Search Display Print Social PR Sales TV 1.0000000 0.3473785 0.3612807 0.1956944 0.3449162 0.3844872 0.4075071 Search 0.3473785 1.0000000 0.6432242 0.2226253 0.3698717 0.4836673 0.7774111 #Search 和 Display相關性較高 Display 0.3612807 0.6432242 1.0000000 0.3897597 0.4632782 0.4294697 0.7715364 Print 0.1956944 0.2226253 0.3897597 1.0000000 0.2141804 0.3192051 0.4029381 Social 0.3449162 0.3698717 0.4632782 0.2141804 1.0000000 0.1640955 0.2368875 PR 0.3844872 0.4836673 0.4294697 0.3192051 0.1640955 1.0000000 0.6467579 Sales 0.4075071 0.7774111 0.7715364 0.4029381 0.2368875 0.6467579 1.0000000
再擬合模型以前須要思考,媒體貢獻是存在衰退曲線的,因此先要將這個模型給製做出來ci
注:在計算營銷組合時一般投放還存在飽和效應,能夠將變量轉化爲指數再進行擬合,但咱們這裏僅計算貢獻量產品
先分析下衰退曲線的模型it
(1)
假設第一週投放300,第二週200,第三週100
假設衰退期爲 第一週0.7,第二週0.2,第三週0.1則每週的效應應該爲
第一週 300*0.7
第二週 300*0.2+200*0.7
第三週 300*0.1+200*0.2+100*0.7
總的效應應該爲全部的相加起來
(2)
最終須要計算總的效應,也就是每週的相加,因此能夠這樣,那上面的例子作假設
#定義一個函數 lagpad<-function(x,k){c(rep(0,k),x[1:(length(x)-k)])}#向右邊移動k位,而且移動的k位用0補充 #R中自己向右平移的函數 lag()但只能做用於時間序列 #rep()重複
> test<-c(300,200,100) #假設第一週投放300,第二週200,第三週100 > testLag<-sapply(0:2,function(x) lagpad(test,x)) > testLag [,1] [,2] [,3] [1,] 300 0 0 [2,] 200 300 0 [3,] 100 200 300 #經過這樣的方法能夠計算總的效應 > as.matrix(testLag)%*%c(0.7,0.2,0.1) #乘以一個向量 [,1] [1,] 210 [2,] 200 [3,] 140 #整個定義爲函數 decayf<-function(x,k,parm){ xlag<-sapply(0:(k-1),function(y) lagpad(x,y)) as.matrix(xlag)%*%parm }
擬合模型
須要模型成共線性的如 sales= INTRCEPT + a*TV + b*PRINT,至關於全部的銷售量等於媒體的投放量*係數之和+基礎銷售量
#一些參數帶到 decayf()函數中去作嘗試 > decay_Search<-decayf(rawMix$Search,3,c(0.8,0.1,0.1)) > decay_Display<-decayf(rawMix$Display,3,c(0.5,0.3,0.2)) > decay_TV<-decayf(rawMix$TV,5,rep(0.2,5)) > decay_Print<-decayf(rawMix$Print,4,rep(0.25,4)) > decay_Social<-decayf(rawMix$Social,2,c(0.7,0.3)) > decay_PR<-decayf(rawMix$PR,3,c(0.4,0.4,0.2)) > modDecay<-data.frame(wday=rawMix$weekNew,Sales=rawMix$Sales, decay_TV,decay_Print,decay_PR,decay_Display,decay_Social,decay_Search) #將結果轉化爲數據框 > str(modDecay) 'data.frame': 58 obs. of 8 variables: $ wday : Date, format: "2015-06-01" "2015-06-08" "2015-06-15" "2015-06-22" ... $ Sales : num 639004 589986 672483 611433 854615 ... $ decay_TV : num 0 0 0 0 0 0 0 0 0 0 ... $ decay_Print : num 0 0 0 0 0 ... $ decay_PR : num 380 758 933 871 829 ... $ decay_Display: num 0 0 0 0 0 0 0 0 0 0 ... $ decay_Social : num 24482 35828 36782 37173 30230 ... $ decay_Search : num 0 0 0 0 0 0 0 0 0 0 ... > mod.lm<-lm(Sales~decay_TV+decay_Print+decay_PR+decay_Display+decay_Search++decay_Social,data=modDecay) #擬合模型 > summary(mod.lm) Call: lm(formula = Sales ~ decay_TV + decay_Print + decay_PR + decay_Display + decay_Search + +decay_Social, data = modDecay) Residuals: Min 1Q Median 3Q Max -612754 -216368 -23446 165218 926416 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.222e+05 2.498e+05 3.291 0.001814 ** decay_TV 7.509e-04 9.367e-04 0.802 0.426466 decay_Print 2.023e-02 3.957e-02 0.511 0.611380 decay_PR 7.640e+02 1.964e+02 3.889 0.000292 *** decay_Display 7.889e-02 1.544e-02 5.111 4.85e-06 *** decay_Search 1.395e+00 5.941e-01 2.349 0.022735 * decay_Social -1.506e+01 6.363e+00 -2.366 0.021807 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 310400 on 51 degrees of freedom Multiple R-squared: 0.8162, Adjusted R-squared: 0.7945 F-statistic: 37.73 on 6 and 51 DF, p-value: < 2.2e-16
> library(MASS) #選擇合適模型 > stepAIC(mod.lm) Start: AIC=1473.43 Sales ~ decay_TV + decay_Print + decay_PR + decay_Display + decay_Search + +decay_Social Df Sum of Sq RSS AIC - decay_Print 1 2.5183e+10 4.9388e+12 1471.7 - decay_TV 1 6.1918e+10 4.9755e+12 1472.2 <none> 4.9136e+12 1473.4 - decay_Search 1 5.3162e+11 5.4452e+12 1477.4 - decay_Social 1 5.3944e+11 5.4531e+12 1477.5 - decay_PR 1 1.4575e+12 6.3711e+12 1486.5 - decay_Display 1 2.5169e+12 7.4305e+12 1495.4 Step: AIC=1471.73 Sales ~ decay_TV + decay_PR + decay_Display + decay_Search + decay_Social Df Sum of Sq RSS AIC - decay_TV 1 5.6163e+10 4.9950e+12 1470.4 <none> 4.9388e+12 1471.7 - decay_Social 1 5.1484e+11 5.4536e+12 1475.5 - decay_Search 1 5.4051e+11 5.4793e+12 1475.8 - decay_PR 1 1.5883e+12 6.5271e+12 1485.9 - decay_Display 1 2.7644e+12 7.7032e+12 1495.5 Step: AIC=1470.38 Sales ~ decay_PR + decay_Display + decay_Search + decay_Social Df Sum of Sq RSS AIC <none> 4.9950e+12 1470.4 - decay_Social 1 6.0844e+11 5.6034e+12 1475.0 - decay_Search 1 1.3276e+12 6.3226e+12 1482.0 - decay_PR 1 1.7638e+12 6.7588e+12 1485.9 - decay_Display 1 2.7211e+12 7.7160e+12 1493.6 Call: lm(formula = Sales ~ decay_PR + decay_Display + decay_Search + decay_Social, data = modDecay) Coefficients: (Intercept) decay_PR decay_Display decay_Search decay_Social 8.322e+05 8.106e+02 7.978e-02 1.696e+00 -1.541e+01
將每一個係數乘以投放額度求的影響貢獻量
> coeff<-mod.lm$coefficients #提取擬合的係數
咱們要求將變量的每一個值與係數相乘求和便可,因此將列表進行下列的轉換
> coef.df<-data.frame(var=names(coeff),coef=as.vector(coeff)) #將數據係數轉換爲數據框,含是係數名稱,列是係數值 > coef.df var coef 1 (Intercept) 8.221908e+05 2 decay_TV 7.509029e-04 3 decay_Print 2.022937e-02 4 decay_PR 7.639506e+02 5 decay_Display 7.889260e-02 6 decay_Search 1.395443e+00 7 decay_Social -1.505728e+01 > library(reshape2) #寬錶轉化爲長表 > head(modDecay) wday Sales decay_TV decay_Print decay_PR decay_Display decay_Social decay_Search 1 2015-06-01 639003.9 0 0 379.6 0 24481.8 0 2 2015-06-08 589986.4 0 0 758.4 0 35828.0 0 3 2015-06-15 672482.5 0 0 933.0 0 36782.0 0 4 2015-06-22 611432.7 0 0 871.0 0 37172.6 0 5 2015-06-29 854615.0 0 0 829.4 0 30229.9 0 6 2015-07-06 537515.9 0 0 848.2 0 31254.0 0 > modx<-melt(modDecay,id=c('wday')) #melt()寬變長,cast()長變寬 > head(modx) wday variable value 1 2015-06-01 Sales 639003.9 2 2015-06-08 Sales 589986.4 3 2015-06-15 Sales 672482.5 4 2015-06-22 Sales 611432.7 5 2015-06-29 Sales 854615.0 6 2015-07-06 Sales 537515.9 > mody<-merge(modx,coef.df,by.x='variable',by.y='var',all.x=T) #經過varible,var將變革整合在一塊兒 > head(mody) variable wday value coef 1 decay_Display 2015-06-01 0 0.0788926 2 decay_Display 2015-06-08 0 0.0788926 3 decay_Display 2015-06-15 0 0.0788926 4 decay_Display 2015-06-22 0 0.0788926 5 decay_Display 2015-06-29 0 0.0788926 6 decay_Display 2015-07-06 0 0.0788926
計算貢獻量
mody$contr<-(mody$value)*(mody$coef) #計算每行的貢獻量,咱們須要將每一個變量的貢獻量相加,因此使用 plyr包 > library(plyr) > chn_contr <- ddply(mody,.(variable),summarise,contr=sum(contr)) #分組求和而且計算貢獻量 > chn_contr variable contr 1 decay_TV 4018415 2 decay_Print 1857004 3 decay_PR 44022654 4 decay_Display 24078166 5 decay_Social -31393235 6 decay_Search 14238806 #產品銷售量除了廣告媒體的貢獻量外還有基礎的銷售量 > sum(modDecay$Sales)-sum(chn_contr$contr,na.rm = T) [1] 47687069