R语言 Kmeans聚类、PAM聚类、层次聚类、EM聚类

您所在的位置:网站首页 pam中文意思是什么 R语言 Kmeans聚类、PAM聚类、层次聚类、EM聚类

R语言 Kmeans聚类、PAM聚类、层次聚类、EM聚类

2023-12-24 06:50| 来源: 网络整理| 查看: 265

关注微信公共号:小程在线

关注CSDN博客:程志伟的博客

R版本:3.6.1

Kmeans函数:kmeans聚类

pam函数:PAM聚类

hclust函数:层次聚类

cutree函数:层次聚类解

Mclust函数:EM聚类

mclustBIC函数:EM聚类

> ##############对模拟数据的K-Means聚类 > setwd('G:\\R语言\\大三下半年\\数据挖掘:R语言实战\\') > set.seed(12345) > x x[1:25,1] x[1:25,2] par(mfrow=c(2,2)) > plot(x,main="样本观测点的分布",xlab="",ylab="") > KMClu1 points(KMClu1$centers,pch=3) > set.seed(12345) > (KMClu1 plot(x,col=(KMClu1$cluster+1),main="K-Means聚类K=2",xlab="",ylab="",pch=20,cex=1.5) > points(KMClu1$centers,pch=3) > set.seed(12345) > KMClu2 KMClu2K-means clustering with 4 clusters of sizes 10, 15, 15, 10

Cluster means:        [,1]      [,2] 1 3.1311572 -5.086319 2 3.2611523 -2.986441 3 0.1445016  1.329080 4 0.3358022 -1.051107

Clustering vector:  [1] 2 1 1 1 1 2 2 2 1 2 2 1 4 2 1 2 2 2 1 2 1 2 2 2 1 3 3 3 3 2 4 3 4 3 4 4 3 [38] 3 4 3 3 3 3 4 3 4 4 3 3 4

Within cluster sum of squares by cluster: [1]  9.294879 20.486878 15.382149 10.803772  (between_SS / total_SS =  87.5 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" [6] "betweenss"    "size"         "iter"         "ifault"       > plot(x,col=(KMClu2$cluster+1),main="K-Means聚类K=4,nstart=1",xlab="",ylab="",pch=20,cex=1.5) > points(KMClu2$centers,pch=3) > KMClu1$betweenss/(2-1)/KMClu1$tot.withinss/(50-2) [1] 0.06119216 > KMClu2$betweenss/(4-1)/KMClu2$tot.withinss/(50-4) [1] 0.05091425 > set.seed(12345) > KMClu2 plot(x,col=(KMClu2$cluster+1),main="K-Means聚类K=4,nstart=30",xlab="",ylab="",pch=20,cex=1.5) > points(KMClu2$centers,pch=3)

从上面可以看出聚类为2类要优于4类

> #####################K-Means聚类应用 > PoData CluData #############K-Means聚类 > set.seed(12345) > CluR CluR$size 各列的样本量 [1]  2 19  4  6 > CluR$centers 类质心         x1       x2       x3       x4       x5       x6 1 11.48000 79.47000 69.43000 59.88000 33.07000  9.62000 2 15.06895 15.09263 20.43263  5.31000 13.37316 16.45105 3 53.39250  8.33500  7.97000  1.42250 36.78750 83.69250 4 26.91000 39.77167 63.68333 10.42833 56.67667 40.70000

> ###########K-Means聚类结果的可视化 #### > par(mfrow=c(2,1)) > PoData$CluR plot(PoData$CluR,pch=PoData$CluR,ylab="类别编号",xlab="省市",main="聚类的类成员",axes=FALSE) > par(las=2) > axis(1,at=1:31,labels=PoData$province,cex.axis=0.6) > axis(2,at=1:4,labels=1:4,cex.axis=0.6) > box() > legend("topright",c("第一类","第二类","第三类","第四类"),pch=1:4,cex=0.4)

 

 

###########K-Means聚类特征的可视化#### > plot(CluR$centers[1,],type="l",ylim=c(0,82),xlab="聚类变量",ylab="组均值(类质心)",main="各类聚类变量均值的变化折线图",axes=FALSE) > axis(1,at=1:6,labels=c("生活污水排放量","生活二氧化硫排放量","生活烟尘排放量","工业固体废物排放量","工业废气排放总量","工业废水排放量"),cex.axis=0.6) > box() > lines(1:6,CluR$centers[2,],lty=2,col=2) > lines(1:6,CluR$centers[3,],lty=3,col=3) > lines(1:6,CluR$centers[4,],lty=4,col=4) > legend("topleft",c("第一类","第二类","第三类","第四类"),lty=1:4,col=1:4,cex=0.3)

第二类的各类排放物排放量均不高;第一类主要是二氧化硫、烟尘和污水排放。

 

 ###########K-Means聚类效果的可视化评价####

#类间差异性 > CluR$betweenss/CluR$totss*100[1] 64.92061

> par(mfrow=c(2,3)) > plot(PoData[,c(2,3)],col=PoData$CluR,main="生活污染情况",xlab="生活污水排放量",ylab="生活二氧化硫排放量") > points(CluR$centers[,c(1,2)],col=rownames(CluR$centers),pch=8,cex=2) > plot(PoData[,c(2,4)],col=PoData$CluR,main="生活污染情况",xlab="生活污水排放量",ylab="生活烟尘排放量") > points(CluR$centers[,c(1,3)],col=rownames(CluR$centers),pch=8,cex=2) > plot(PoData[,c(3,4)],col=PoData$CluR,main="生活污染情况",xlab="生活二氧化硫排放量",ylab="生活烟尘排放量") > points(CluR$centers[,c(2,3)],col=rownames(CluR$centers),pch=8,cex=2) > plot(PoData[,c(5,6)],col=PoData$CluR,main="工业污染情况",xlab="工业固体废物排放量",ylab="工业废气排放总量") > points(CluR$centers[,c(4,5)],col=rownames(CluR$centers),pch=8,cex=2) > plot(PoData[,c(5,7)],col=PoData$CluR,main="工业污染情况",xlab="工业固体废物排放量",ylab="工业废水排放量") > points(CluR$centers[,c(4,6)],col=rownames(CluR$centers),pch=8,cex=2) > plot(PoData[,c(6,7)],col=PoData$CluR,main="工业污染情况",xlab="工业废气排放总量",ylab="工业废水排放量") > points(CluR$centers[,c(5,6)],col=rownames(CluR$centers),pch=8,cex=2)

从上图可以看出类质心位置较远

> #################PAM聚类#### > set.seed(12345) > x x[1:25,1] x[1:25,2] library("cluster") > set.seed(12345)

#聚成2类 > (PClu plot(x=PClu,data=x)

 

> ################层次聚类#### > PoData CluData DisMatrix CluR ###############层次聚类的树形图 > par(mfrow=c(1,1)) > plot(CluR,labels=PoData[,1]) > box()

 

> ###########层次聚类的碎石图

当聚类我的数目为4时,最小的类间距离变大 > plot(CluR$height,30:1,type="b",cex=0.7,xlab="距离测度",ylab="聚类数目")

> PoData$memb table(PoData$memb) #查看各类的个数

 1  2  3  4   7  7 13  4  > plot(PoData$memb,pch=PoData$memb,ylab="类别编号",xlab="省市",main="聚类的类成员",axes=FALSE) > par(las=2) > axis(1,at=1:31,labels=PoData$province,cex.axis=0.6) > axis(2,at=1:4,labels=1:4,cex.axis=0.6) > box()

 

> ##############混合高斯分布模拟 > library("MASS") > set.seed(12345) > mux1 muy1 mux2 muy2 ss1 ss2 s12 sigma Data1 Data2 Data plot(Data,xlab="x",ylab="y")

> library("mclust") > DataDens plot(x=DataDens,type="persp",col=grey(level=0.8),xlab="x",ylab="y") Model-based density estimation plots: 

1: BIC 2: density

Selection: 1

Model-based density estimation plots: 

1: BIC 2: density

Selection: 2

 

 

> #########################对模拟数据的EM聚类 > library("mclust") > EMfit summary(EMfit) ----------------------------------------------------  Gaussian finite mixture model fitted by EM algorithm  ---------------------------------------------------- 

Mclust EEE (ellipsoidal, equal volume, shape and orientation) model with 2 components: 

 log-likelihood   n df       BIC       ICL        -857.359 150  8 -1754.803 -1755.007

Clustering table:   1   2  100  50  > summary(EMfit,parameters=TRUE) ----------------------------------------------------  Gaussian finite mixture model fitted by EM algorithm  ---------------------------------------------------- 

Mclust EEE (ellipsoidal, equal volume, shape and orientation) model with 2 components: 

 log-likelihood   n df       BIC       ICL        -857.359 150  8 -1754.803 -1755.007

Clustering table:   1   2  100  50 

Mixing probabilities:         1         2  0.6663218 0.3336782 

Means:              [,1]     [,2] [1,] -0.003082719 14.99065 [2,] -0.001821635 14.98813

Variances: [,,1]          [,1]     [,2] [1,] 9.882603 2.988535 [2,] 2.988535 9.907798 [,,2]          [,1]     [,2] [1,] 9.882603 2.988535 [2,] 2.988535 9.907798 > plot(EMfit,"classification")

> plot(EMfit,"uncertainty")

> plot(EMfit,"density")

> #############通过mclustBIC函数实现EM聚类#### > (BIC plot(BIC,G=1:7,col="black")

> (BICsum mclust2Dplot(Data,classification=BICsum$classification,parameters=BICsum$parameters)

> ###################实例数据的EM聚类#### > PoData CluData library("mclust") > EMfit summary(EMfit) ----------------------------------------------------  Gaussian finite mixture model fitted by EM algorithm  ---------------------------------------------------- 

Mclust EEV (ellipsoidal, equal volume and shape) model with 5 components: 

 log-likelihood  n  df       BIC       ICL       -542.7661 31 115 -1480.441 -1480.441

Clustering table: 1 2 3 4 5  6 8 5 7 5  > plot(EMfit,"BIC")

> plot(EMfit,"classification")



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3