Calinski-Harabasz準則有時稱爲方差比準則 (VRC),它能夠用來肯定聚類的最佳K值。Calinski Harabasz 指數定義爲:函數
其中,K是聚類數,N是樣本數,SSB是組與組之間的平方和偏差,SSw是組內平方和偏差。所以,若是SSw越小、SSB越大,那麼聚類效果就會越好,即Calinsky criterion值越大,聚類效果越好。spa
1.下載permute、lattice、vegan包3d
install.packages(c("permute","lattice","vegan"))
2.引入permute、lattice、vegan包code
library(permute) library(lattice) library(vegan)
3.讀取數據orm
data <- read.csv("data/data.csv")
4.計算最佳K值blog
fit <- cascadeKM(data,3,10,iter=10,criterion="calinski") calinski.best <- as.numeric(which.max(fit$results[2,]))
5.圖片保存圖片
png(file="data/calinskibest.png") plot(fit, sortg = TRUE, grpmts.plot = TRUE) dev.off()
6.截圖get
封裝DetermineClustersNumHelper.R類it
# ============================ # 肯定最佳聚類K值 # # ============================ # 引入包庫 library(permute) library(lattice) library(vegan) # 獲取最佳K值函數 get_best_calinski <- function(file_name){ # 獲取故障數據 data <- read.csv(paste("data/km/",file_name,".csv",sep=""),header = T) # 計算 fit <- cascadeKM(data,3,10,iter=10,criterion="calinski") calinski.best <- as.numeric(which.max(fit$results[2,])) # 保存圖片 png(file=paste("data/km/",file_name,calinski.best,".png",sep="")) plot(fit, sortg = TRUE, grpmts.plot = TRUE) dev.off() } # ========================================================================== # For example #file_list <- array(c("failure_data_normalization","failure_normal_data_normalization")) #for(file in file_list){# # 調用函數 # get_best_calinski(file) #} # ==========================================================================