方差分析的主要功能就是驗證兩組樣本,或者兩組以上的樣本均值是否有顯著性差別,即均值是否同樣。python
這裏有兩個大點須要注意:①方差分析的原假設是:樣本不存在顯著性差別(即,均值徹底相等);②兩樣本數據無交互做用(即,樣本數據獨立)這一點在雙因素方差分析中判斷兩因素是否獨立時用。api
原理:spa
方差分析的原理就一個方程:SST=SS組間+SSR組內 (所有平方和=組間平方和+組內平方和)3d
說明:方差分析本質上對總變異的解釋。code
方差分析看的最終結果看的統計量是:F統計量、R2。orm
其中:g爲組別個數,n爲每一個組內數據長度。blog
python實現:ip
from scipy import stats from statsmodels.formula.api import ols from statsmodels.stats.anova import anova_lm from statsmodels.stats.multicomp import pairwise_tukeyhsd import warnings warnings.filterwarnings("ignore") import itertools df2=pd.DataFrame() df2['group']=list(itertools.repeat(-1.,9))+ list(itertools.repeat(0.,9))+list(itertools.repeat(1.,9)) df2['noise_A']=0.0 for i in data['A'].unique(): df2.loc[df2['group']==i,'noise_A']=data.loc[data['A']==i,['1','2','3']].values.flatten() df2['noise_B']=0.0 for i in data['B'].unique(): df2.loc[df2['group']==i,'noise_B']=data.loc[data['B']==i,['1','2','3']].values.flatten() df2['noise_C']=0.0 for i in data['C'].unique(): df2.loc[df2['group']==i,'noise_C']=data.loc[data['C']==i,['1','2','3']].values.flatten() df2
# for A anova_reA= anova_lm(ols('noise_A~C(group)',data=df2[['group','noise_A']]).fit()) print(anova_reA) #B anova_reB= anova_lm(ols('noise_B~C(group)',data=df2[['group','noise_B']]).fit()) print(anova_reB) #C anova_reC= anova_lm(ols('noise_C~C(group)',data=df2[['group','noise_C']]).fit()) print(anova_reC)
從結果能夠看出,A、B兩樣本,在每一個組間均值顯著無差別,C樣本的組間均值是有差別的。ci