Statistics Toolbox | ![]() ![]() |
Verifying the Cluster Tree
One way to measure the validity of the cluster information generated by the linkage
function is to compare it with the original proximity data generated by the pdist
function. If the clustering is valid, the linking of objects in the cluster tree should have a strong correlation with the distances between objects in the distance vector. The cophenet
function compares these two sets of values and computes their correlation, returning a value called the cophenetic correlation coefficient. The closer the value of the cophenetic correlation coefficient is to 1, the better the clustering solution.
You can use the cophenetic correlation coefficient to compare the results of clustering the same data set using different distance calculation methods or clustering algorithms.
For example, you can use the cophenet
function to evaluate the clusters created for the sample data set
c = cophenet(Z,Y) c = 0.8573
where Z
is the matrix output by the linkage
function and Y
is the distance vector output by the pdist
function.
Execute pdist
again on the same data set, this time specifying the City Block metric. After running the linkage
function on this new pdist
output, use the cophenet
function to evaluate the clustering using a different distance metric.
c = cophenet(Z,Y) c = 0.9289
The cophenetic correlation coefficient shows a stronger correlation when the City Block metric is used.
![]() | Evaluating Cluster Formation | Getting More Information About Cluster Links | ![]() |