Jun 01, 2021 Article blog
Cluster analysis, also known as group analysis, is a statistical analysis method used to study sample classification problems, and it is also an important data mining algorithm. Clustering is composed of several patterns, usually, patterns are vectors of a measure, and clustering is based on similarity, with more similarities between patterns in a cluster than patterns that are not in the same cluster.
For clustering algorithms, most are implemented with
SPSS软件
usually importing data, and selecting clustering methods to achieve, this paper borrows
MATLAB软件
based on 14 different clustering methods, to achieve sample clustering.
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'euclid');
M=squareform(D);
Z=linkage(D,'complete');
H=dendrogram(Z);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(2) The shortest distance method
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'euclid');
M=squareform(D);
Z=linkage(D,'single')
;H=dendrogram(Z);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,'cutoff',0.8);
(3) Comprehensive clustering sub-procedure
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
T=clusterdata(X,0.8);
Re=find(T=5)
(4) Center of gravity method and standard Euclidean distance
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'seuclid');
M=squareform(D);
Z=linkage(D,'centroid');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(5) Center of Gravity Method - Euclidean Distance Square
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'euclid');
D2=D.^2;
M=squareform(D2);
Z=linkage(D2,'centroid');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D2);
T=cluster(Z,3);
(6) Center of gravity method and precision weighted distance
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
[n,m]=size(X);
stdx=std(X);
X2=X./stdx(ones(n,1),:);
D=pdist(X2,'euclid');
M=squareform(D);
Z=linkage(D,'centroid');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(7) The shortest distance method is based on the standard European distance of the main ingredient
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
[E,score,eigen,T]=princomp(X);
D=pdist(score,'seuclid');
M=squareform(D);
Z=linkage(D,'single');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(8) Average French-standard European distance
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'seuclid');
M=squareform(D);
Z=linkage(D,'average');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(9) Weighting method and standard European distance
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'seuclid');
M=squareform(D);
Z=linkage(D,'weighted');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(10) Shortest distance method - Mars distance
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'mahal');M=squareform(D);Z=linkage(D,'single');H=dendrogram(Z,'labels',S);xlabel('City');ylabel('Scale');C=cophenet(Z,D);T=cluster(Z,3);
(11) European distance of center of gravity method and standardized data
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
[n,m]=size(X);
mv=mean(X);
st=std(X);
x=(X-mv(ones(n,1),:))./st(ones(n,1),:);
D=pdist(X,'euclid');
M=squareform(D);
Z=linkage(D,'centroid');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(12) Maximum distance French-European distance
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'euclid');
M=squareform(D);
Z=linkage(D,'complete');
[H tPerm]=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(13) Average method and similar coefficient
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
D=pdist(X,'cosine');
M=squareform(D);
Z=linkage(D,'centroid');
T=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(14) Minimum distance method - standard European distance based on the main ingredient
S=['福冈';'合肥';'武汉';'长沙';'桂林';'温州';'成都'];
X=[16.21492 2000 -8.2 6.2;
15.7 970 2209 -20.6 1.9;
16.3 1260 2085 -17.3 2.8;
17.2 14221726 -9.5 4.6;
18.8 1874 1709 -4.9 8.0;
17.9 1698 1848 -4.5 7.5;
16.3 976 1239-4.6 5.6];
[E,score,eigen,T]=princomp(X);
PCA=[score(:,1),score(:,2)];
D=pdist(PCA,'seuclid');
M=squareform(D);
Z=linkage(D,'single');
H=dendrogram(Z,'labels',S);
xlabel('City');
ylabel('Scale');
C=cophenet(Z,D);
T=cluster(Z,3);
(Recommended tutorial: MATLAB tutorial)
Source: www.toutiao.com/a6863649930347545091/
That's what
W3Cschool编程狮
has to say about MATLAB's analysis based on 14 clustering methods.