最重要的一點是:Bayesian GMM爲何擬合的更好? html
PRML 這段文字作了解釋:app
Ref: http://freemind.pluskid.org/machine-learning/deciding-the-number-of-clusterings/ dom
連接中提到了一些其餘的無監督聚類。this
From: http://scikit-learn.org/stable/modules/mixture.html#variational-bayesian-gaussian-mixturespa
Due to its Bayesian nature, the variational algorithm needs more hyper- parameters than expectation-maximization,3d
the most important of these being the concentration parameter weight_concentration_prior
.code
The examples below compare Gaussian mixture models with a fixed number of components, to the variational Gaussian mixture models with a Dirichlet process prior. Here, a classical Gaussian mixture is fitted with 5 components on a dataset composed of 2 clusters.component
We can see that the variational Gaussian mixture with a Dirichlet process prior is able to limit itself to only 2 components whereas the Gaussian mixture fits the data with a fixed number of components that has to be set a priori by the user. In this case the user has selected n_components=5
which does not match the true generative distribution of this toy dataset. Note that with very little observations, the variational Gaussian mixture models with a Dirichlet process prior can take a conservative stand, and fit only one component.orm
Dirichlet distribution 具備自動的特徵選取的做用,找出起主要做用的components。htm
5 for GMM [ 0.1258077 0.23638361 0.23330578 0.26361639 0.14088652] 5 for Bayesian GMM [ 0.001019 0.00101796 0.49948856 0.47955123 0.01892325]
問題來了:
爲何dirichlet會讓三個的權重偏小,而GMM卻沒有,難道是收斂速度不一樣?
應該跟速度沒有關係。加了先驗後,後驗變爲了dirichlet,那麼參數的估計過程當中便具有了dirichlet的良好性質。
Our data set will be the classic Old Faithful dataset.
plt.scatter(data['eruptions'], data['waiting'], alpha=0.5); plt.xlabel('eruptions'); plt.ylabel('waiting');
from sklearn.mixture import BayesianGaussianMixture mixture_model = BayesianGaussianMixture( n_components=10, random_state=5, # control the pseudo-random initialization
weight_concentration_prior_type='dirichlet_distribution', weight_concentration_prior=1.0, # parameter of the Dirichlet component prior
max_iter=200, # choose this to be big in case it takes a long time to fit
) mixture_model.fit(data);
Ref: http://scikit-learn.org/stable/auto_examples/mixture/plot_concentration_prior.html
可直接調用該程式:
plot_ellipses(ax1, model.weights_, model.means_, model.covariances_) def plot_ellipses(ax, weights, means, covars): """ Given a list of mixture component weights, means, and covariances, plot ellipses to show the orientation and scale of the Gaussian mixture dispersal. """
for n in range(means.shape[0]): eig_vals, eig_vecs = (covars[n])
unit_eig_vec = eig_vecs[0] / np.linalg.norm(eig_vecs[0]) angle = np.arctan2(unit_eig_vec[1], unit_eig_vec[0]) # Ellipse needs degrees
angle = 180 * angle / np.pi # eigenvector normalization
eig_vals = 2 * np.sqrt(2) * np.sqrt(eig_vals) ell = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor=None,) ell2 = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor='black', fill=False, linewidth=1,) ell.set_clip_box(ax.bbox) ell2.set_clip_box(ax.bbox) ell.set_alpha(weights[n]) ell.set_facecolor('#56B4E9') ax.add_artist(ell) ax.add_artist(ell2)
plot_results( mixture_model, data['eruptions'], data['waiting'], 'weight_concentration_prior={}'.format(1.0)) def plot_results(model, x, y, title, plot_title=False): fig, ax = plt.subplots(3, 1, sharex=False)
# 上面是ax沒用,如下從新定義了ax1 ax2 gs = gridspec.GridSpec(3, 1) # 自定義子圖位置 ax1 = plt.subplot(gs[0:2, 0])
# 如下四行是固定套路 ax1.set_title(title) ax1.scatter(x, y, s=5, marker='o', alpha=0.8) ax1.set_xticks(()) ax1.set_yticks(())
n_components = model.get_params()['n_components'] plot_ellipses(ax1, model.weights_, model.means_, model.covariances_)
# ax1:畫橢圓
# ax2:畫權重 ax2 = plt.subplot(gs[2, 0]) ax2.get_xaxis().set_tick_params(direction='out') ax2.yaxis.grid(True, alpha=0.7) for k, w in enumerate(model.weights_): ax2.bar(k, w, width=0.9, color='#56B4E9', zorder=3, align='center', edgecolor='black') ax2.text(k, w + 0.007, "%.1f%%" % (w * 100.), horizontalalignment='center') ax2.set_xlim(-.6, n_components - .4) ax2.set_ylim(0., 1.1) ax2.tick_params(axis='y', which='both', left='off', right='off', labelleft='off') ax2.tick_params(axis='x', which='both', top='off') if plot_title: ax1.set_ylabel('Estimated Mixtures') ax2.set_ylabel('Weight of each component')
lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components =10, covariance_type ='full', max_iter =1, random_state =2, weight_concentration_prior_type ='dirichlet_distribution', warm_start =True, )
# 設置model.fit爲只遞歸一次 for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) if i%5==0 and i<60: plt.figure(); plot_results( mixture_model, data['eruptions'], data['waiting'], 'EM step={}, lower_bound={}'.format( i, mixture_model.lower_bound_) ); plt.figure(); plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound')
Lower bound 逐漸增長。
不一樣初始,效果對比:
for seed in range(6,11): lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components=10, covariance_type='full', max_iter=1, random_state=seed, weight_concentration_prior_type='dirichlet_distribution', warm_start=True, ) for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound');
Result: