[Scikit-learn] 2.1 Clustering - Variational Bayesian Gaussian Mixture

最重要的一點是:Bayesian GMM爲何擬合的更好? html

PRML 這段文字作了解釋:app

 

Ref: http://freemind.pluskid.org/machine-learning/deciding-the-number-of-clusterings/ dom

連接中提到了一些其餘的無監督聚類。this

 

From: http://scikit-learn.org/stable/modules/mixture.html#variational-bayesian-gaussian-mixturespa

Due to its Bayesian nature, the variational algorithm needs more hyper- parameters than expectation-maximization,3d

the most important of these being the concentration parameter weight_concentration_prior.code

  • Specifying a low value for the concentration prior will make the model put most of the weight on few components set the remaining components weights very close to zero.
  • High values of the concentration prior will allow a larger number of components to be active in the mixture.

 

The examples below compare Gaussian mixture models with a fixed number of components, to the variational Gaussian mixture models with a Dirichlet process prior. Here, a classical Gaussian mixture is fitted with 5 components on a dataset composed of 2 clusters.component

We can see that the variational Gaussian mixture with a Dirichlet process prior is able to limit itself to only 2 components whereas the Gaussian mixture fits the data with a fixed number of components that has to be set a priori by the user. In this case the user has selected n_components=5 which does not match the true generative distribution of this toy dataset. Note that with very little observations, the variational Gaussian mixture models with a Dirichlet process prior can take a conservative stand, and fit only one component.orm

 

Dirichlet distribution 具備自動的特徵選取的做用,找出起主要做用的components。htm

5 for GMM [ 0.1258077   0.23638361  0.23330578  0.26361639  0.14088652] 5 for Bayesian GMM [ 0.001019    0.00101796  0.49948856  0.47955123  0.01892325]

 

問題來了:

爲何dirichlet會讓三個的權重偏小,而GMM卻沒有,難道是收斂速度不一樣?

應該跟速度沒有關係。加了先驗後,後驗變爲了dirichlet,那麼參數的估計過程當中便具有了dirichlet的良好性質。

原始數據

Our data set will be the classic Old Faithful dataset

plt.scatter(data['eruptions'], data['waiting'], alpha=0.5); plt.xlabel('eruptions'); plt.ylabel('waiting');

 如何擬合?

from sklearn.mixture import BayesianGaussianMixture mixture_model = BayesianGaussianMixture( n_components=10, random_state=5,  # control the pseudo-random initialization
    weight_concentration_prior_type='dirichlet_distribution', weight_concentration_prior=1.0,  # parameter of the Dirichlet component prior
    max_iter=200,  # choose this to be big in case it takes a long time to fit
) mixture_model.fit(data);

 

Ref: http://scikit-learn.org/stable/auto_examples/mixture/plot_concentration_prior.html

可直接調用該程式:

 

plot_ellipses(ax1, model.weights_, model.means_, model.covariances_) def plot_ellipses(ax, weights, means, covars): """ Given a list of mixture component weights, means, and covariances, plot ellipses to show the orientation and scale of the Gaussian mixture dispersal. """
    for n in range(means.shape[0]): eig_vals, eig_vecs = (covars[n])
unit_eig_vec
= eig_vecs[0] / np.linalg.norm(eig_vecs[0]) angle = np.arctan2(unit_eig_vec[1], unit_eig_vec[0]) # Ellipse needs degrees angle = 180 * angle / np.pi # eigenvector normalization eig_vals = 2 * np.sqrt(2) * np.sqrt(eig_vals) ell = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor=None,) ell2 = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor='black', fill=False, linewidth=1,) ell.set_clip_box(ax.bbox) ell2.set_clip_box(ax.bbox) ell.set_alpha(weights[n]) ell.set_facecolor('#56B4E9') ax.add_artist(ell) ax.add_artist(ell2)

 

plot_results( mixture_model, data['eruptions'], data['waiting'], 'weight_concentration_prior={}'.format(1.0)) def plot_results(model, x, y, title, plot_title=False): fig, ax = plt.subplots(3, 1, sharex=False)
# 上面是ax沒用,如下從新定義了ax1 ax2 gs
= gridspec.GridSpec(3, 1)  # 自定義子圖位置 ax1 = plt.subplot(gs[0:2, 0])
# 如下四行是固定套路 ax1.set_title(title) ax1.scatter(x, y, s
=5, marker='o', alpha=0.8) ax1.set_xticks(()) ax1.set_yticks(())
n_components
= model.get_params()['n_components'] plot_ellipses(ax1, model.weights_, model.means_, model.covariances_)
# ax1:畫橢圓
# ax2:畫權重 ax2
= plt.subplot(gs[2, 0]) ax2.get_xaxis().set_tick_params(direction='out') ax2.yaxis.grid(True, alpha=0.7) for k, w in enumerate(model.weights_): ax2.bar(k, w, width=0.9, color='#56B4E9', zorder=3, align='center', edgecolor='black') ax2.text(k, w + 0.007, "%.1f%%" % (w * 100.), horizontalalignment='center') ax2.set_xlim(-.6, n_components - .4) ax2.set_ylim(0., 1.1) ax2.tick_params(axis='y', which='both', left='off', right='off', labelleft='off') ax2.tick_params(axis='x', which='both', top='off') if plot_title: ax1.set_ylabel('Estimated Mixtures') ax2.set_ylabel('Weight of each component')

 

查看擬合過程:

lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components =10, covariance_type ='full', max_iter =1, random_state =2, weight_concentration_prior_type ='dirichlet_distribution', warm_start =True, )
# 設置model.fit爲只遞歸一次
for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) if i%5==0 and i<60: plt.figure(); plot_results( mixture_model, data['eruptions'], data['waiting'], 'EM step={}, lower_bound={}'.format( i, mixture_model.lower_bound_) ); plt.figure(); plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound')

 

 

 

 

Lower bound 逐漸增長。 

 

不一樣初始,效果對比:

for seed in range(6,11): lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components=10, covariance_type='full', max_iter=1, random_state=seed, weight_concentration_prior_type='dirichlet_distribution', warm_start=True, ) for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound');

Result: 

 

相關文章
相關標籤/搜索