K-Means聚類算法

時間 2019-11-13

標籤 means 算法简体版

原文原文鏈接

學習數據挖掘基本算法。java

分類與聚類算法

聚類(clustering)是指根據「物以類聚」的原理，將自己沒有類別的樣本彙集成不一樣的組，這樣的一組數據對象的集合叫作簇，而且對每個這樣的簇進行描述的過程。數據庫

在分類（ classification ）中，對於目標數據庫中存在哪些類是知道的，要作的就是將每一條記錄分別屬於哪一類標記出來。dom

聚類分析也稱無監督學習，由於和分類學習相比，聚類的樣本沒有標記，須要由聚類學習算法來自動肯定。聚類分析是研究如何在沒有訓練的條件下把樣本劃分爲若干類。ide

K-Means 算法學習

從上圖中，咱們能夠看到，A, B, C, D, E 是五個在圖中點。this

而灰色的點是咱們的種子點，也就是咱們用來找點羣的點。idea

有兩個種子點，因此K=2。而後，K-Means的算法以下：隨機在圖中取K（這裏K=2）個種子點。而後對圖中的全部點求到這K個種子點的距離，假如點Pi離種子點Si最近，那麼Pi屬於Si點羣。（上圖中，咱們能夠看到A,B屬於上面的種子點，C,D,E屬於下面中部的種子點）接下來，咱們要移動種子點到屬於他的「點羣」的中心。（見圖上的第三步）而後重複第2）和第3）步，直到，種子點沒有移動（咱們能夠看到圖中的第四步上面的種子點聚合了A,B,C，下面的種子點聚合了D，E）。code

K-Means算法過程對象

輸入：簇的數目k和包含n個對象的數據庫。

輸出：k個簇，使平方偏差準則最小。

算法步驟：

1.爲每一個聚類肯定一個初始聚類中心，這樣就有K 個初始聚類中心。

2.將樣本集中的樣本按照最小距離原則分配到最鄰近聚類

3.使用每一個聚類中的樣本均值做爲新的聚類中心。

4.重複步驟2.3直到聚類中心再也不變化。

5.結束，獲得K個聚類

JAVA代碼實現

Point類

public class Point implements Comparable<Point>{
	private double x;
	private double y;
	private Double distance;
	private String className;
	public Point (String x,String y){
		this.x=Double.parseDouble(x);
		this.y=Double.parseDouble(y);
	}
	
	public double EuclideanDistance(Point p){
		this.distance=Math.sqrt(Math.pow(p.x-this.x,2)+Math.pow(p.y-this.y,2));
		return distance;
	}
	public double getX() {
		return x;
	}

	public void setX(double x) {
		this.x = x;
	}

	public double getY() {
		return y;
	}

	public void setY(double y) {
		this.y = y;
	}

	public double getDistance() {
		return distance;
	}
	public void setDistance(double distance) {
		this.distance = distance;
	}
	public String getClassName() {
		return className;
	}
	public void setClassName(String className) {
		this.className = className;
	}
	@Override  
	public int compareTo(Point o) {  
	     return this.distance.compareTo(o.distance);  
	 }  
}

Kmeans類

public class Kmeans {

	private String filePath;
	private double judgeStandrad=Double.MAX_VALUE;
	private int k;
	private ArrayList<String[]> dataArray;
	private ArrayList<Point> basePoint;
	private ArrayList<Point> totalPoint;
	private int pointNum;

	public Kmeans(String filePath, int k) {
		this.filePath = filePath;
		this.k = k;
	}
	public void Init() {
		try {
			BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(filePath)));
			String line;
			while ((line = bufferedReader.readLine()) != null) {
				String[] data = line.split(" ");
				dataArray.add(data);
			}
			for (String[] data : dataArray) {
				Point point = new Point(data[0], data[1]);
				totalPoint.add(point);
			}
			pointNum = totalPoint.size();
			Set set = new TreeSet<Integer>();
			while (set.size() < k) {
				int number = (int) (Math.random() * (pointNum - 1));
				if (!set.contains(number)) {
					set.add(number);
				}

			}
			Iterator<Integer> iterator = set.iterator();
			while (iterator.hasNext()) {
				Integer integer = iterator.next();
				Point point = totalPoint.get(integer);
				point.setClassName(integer.toString());
				basePoint.add(point);
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	public void Cluster(){
		
		while (judgeStandrad > 0.01*k) {  
			
			for(Point point :totalPoint){
				double minDistance=Double.MAX_VALUE;
				for(Point basePoint:basePoint){
					double tempDistance=basePoint.EuclideanDistance(point);
				}
				Collections.sort(basePoint);
				point.setClassName(basePoint.get(0).getClassName());
			}
			judgeStandrad=0;
			for(Point basepoint:basePoint){
				int count =0;
				double tempx=0;
				double tempy=0;
				for(Point point :totalPoint){
					if (point.getClassName().equals(basepoint.getClassName())) {
						count++;
						tempx=tempx+point.getX();
						tempy=tempy+point.getY();
					}
				}
				tempx=tempx/count;
				tempy=tempy/count;
				judgeStandrad += Math.abs((tempx - basepoint.getX()));  
				judgeStandrad += Math.abs((tempy- basepoint.getY()));  
				basepoint.setX(tempx);
				basepoint.setY(tempy);
			}
		}
		for (Point point :basePoint) {   
	        System.out.println("聚類中心："+point.getX()+","+point.getY());
	    }  
	}
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。