[Scikit-learn] 2.1 Clustering - Variational Bayesian Gaussian Mixture

時間 2019-11-18

標籤 scikit learn 2.1 clustering variational bayesian gaussian mixture 简体版

原文原文鏈接

最重要的一點是：Bayesian GMM爲何擬合的更好？ html

PRML 這段文字作了解釋：app

Ref: http://freemind.pluskid.org/machine-learning/deciding-the-number-of-clusterings/ dom

連接中提到了一些其餘的無監督聚類。this

From: http://scikit-learn.org/stable/modules/mixture.html#variational-bayesian-gaussian-mixturespa

Due to its Bayesian nature, the variational algorithm needs more hyper- parameters than expectation-maximization,3d

the most important of these being the concentration parameter weight_concentration_prior.code

Specifying a low value for the concentration prior will make the model put most of the weight on few components set the remaining components weights very close to zero.
High values of the concentration prior will allow a larger number of components to be active in the mixture.

The examples below compare Gaussian mixture models with a fixed number of components, to the variational Gaussian mixture models with a Dirichlet process prior. Here, a classical Gaussian mixture is fitted with 5 components on a dataset composed of 2 clusters.component

We can see that the variational Gaussian mixture with a Dirichlet process prior is able to limit itself to only 2 components whereas the Gaussian mixture fits the data with a fixed number of components that has to be set a priori by the user. In this case the user has selected n_components=5 which does not match the true generative distribution of this toy dataset. Note that with very little observations, the variational Gaussian mixture models with a Dirichlet process prior can take a conservative stand, and fit only one component.orm

Dirichlet distribution 具備自動的特徵選取的做用，找出起主要做用的components。htm

5 for GMM [ 0.1258077   0.23638361  0.23330578  0.26361639  0.14088652] 5 for Bayesian GMM [ 0.001019    0.00101796  0.49948856  0.47955123  0.01892325]

問題來了：

爲何dirichlet會讓三個的權重偏小，而GMM卻沒有，難道是收斂速度不一樣？

應該跟速度沒有關係。加了先驗後，後驗變爲了dirichlet，那麼參數的估計過程當中便具有了dirichlet的良好性質。

原始數據

Our data set will be the classic Old Faithful dataset.

plt.scatter(data['eruptions'], data['waiting'], alpha=0.5); plt.xlabel('eruptions'); plt.ylabel('waiting');

如何擬合？

from sklearn.mixture import BayesianGaussianMixture mixture_model = BayesianGaussianMixture( n_components=10, random_state=5,  # control the pseudo-random initialization
    weight_concentration_prior_type='dirichlet_distribution', weight_concentration_prior=1.0,  # parameter of the Dirichlet component prior
    max_iter=200,  # choose this to be big in case it takes a long time to fit
) mixture_model.fit(data);

Ref: http://scikit-learn.org/stable/auto_examples/mixture/plot_concentration_prior.html

可直接調用該程式：

plot_ellipses(ax1, model.weights_, model.means_, model.covariances_) def plot_ellipses(ax, weights, means, covars): """ Given a list of mixture component weights, means, and covariances, plot ellipses to show the orientation and scale of the Gaussian mixture dispersal. """
    for n in range(means.shape[0]): eig_vals, eig_vecs = (covars[n])
 unit_eig_vec = eig_vecs[0] / np.linalg.norm(eig_vecs[0]) angle = np.arctan2(unit_eig_vec[1], unit_eig_vec[0]) # Ellipse needs degrees
        angle = 180 * angle / np.pi # eigenvector normalization
        eig_vals = 2 * np.sqrt(2) * np.sqrt(eig_vals) ell = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor=None,) ell2 = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor='black', fill=False, linewidth=1,) ell.set_clip_box(ax.bbox) ell2.set_clip_box(ax.bbox) ell.set_alpha(weights[n]) ell.set_facecolor('#56B4E9') ax.add_artist(ell) ax.add_artist(ell2)

plot_results( mixture_model, data['eruptions'], data['waiting'], 'weight_concentration_prior={}'.format(1.0)) def plot_results(model, x, y, title, plot_title=False): fig, ax = plt.subplots(3, 1, sharex=False)
 # 上面是ax沒用，如下從新定義了ax1 ax2 gs = gridspec.GridSpec(3, 1)　　# 自定義子圖位置 ax1 = plt.subplot(gs[0:2, 0])
 # 如下四行是固定套路 ax1.set_title(title) ax1.scatter(x, y, s=5, marker='o', alpha=0.8) ax1.set_xticks(()) ax1.set_yticks(())
 n_components = model.get_params()['n_components'] plot_ellipses(ax1, model.weights_, model.means_, model.covariances_) 
 # ax1:畫橢圓
 # ax2:畫權重 ax2 = plt.subplot(gs[2, 0]) ax2.get_xaxis().set_tick_params(direction='out') ax2.yaxis.grid(True, alpha=0.7) for k, w in enumerate(model.weights_): ax2.bar(k, w, width=0.9, color='#56B4E9', zorder=3, align='center', edgecolor='black') ax2.text(k, w + 0.007, "%.1f%%" % (w * 100.), horizontalalignment='center') ax2.set_xlim(-.6, n_components - .4) ax2.set_ylim(0., 1.1) ax2.tick_params(axis='y', which='both', left='off', right='off', labelleft='off') ax2.tick_params(axis='x', which='both', top='off') if plot_title: ax1.set_ylabel('Estimated Mixtures') ax2.set_ylabel('Weight of each component')

查看擬合過程：

lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components =10, covariance_type ='full', max_iter =1, random_state =2, weight_concentration_prior_type ='dirichlet_distribution', warm_start =True, )
# 設置model.fit爲只遞歸一次 for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) if i%5==0 and i<60: plt.figure(); plot_results( mixture_model, data['eruptions'], data['waiting'], 'EM step={}, lower_bound={}'.format( i, mixture_model.lower_bound_) ); plt.figure(); plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound')

Lower bound 逐漸增長。

不一樣初始，效果對比：

for seed in range(6,11): lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components=10, covariance_type='full', max_iter=1, random_state=seed, weight_concentration_prior_type='dirichlet_distribution', warm_start=True, ) for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound');

Result:

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。