Statistics Toolbox    

Verifying the Cluster Tree

One way to measure the validity of the cluster information generated by the linkage function is to compare it with the original proximity data generated by the pdist function. If the clustering is valid, the linking of objects in the cluster tree should have a strong correlation with the distances between objects in the distance vector. The cophenet function compares these two sets of values and computes their correlation, returning a value called the cophenetic correlation coefficient. The closer the value of the cophenetic correlation coefficient is to 1, the better the clustering solution.

You can use the cophenetic correlation coefficient to compare the results of clustering the same data set using different distance calculation methods or clustering algorithms.

For example, you can use the cophenet function to evaluate the clusters created for the sample data set

where Z is the matrix output by the linkage function and Y is the distance vector output by the pdist function.

Execute pdist again on the same data set, this time specifying the City Block metric. After running the linkage function on this new pdist output, use the cophenet function to evaluate the clustering using a different distance metric.

The cophenetic correlation coefficient shows a stronger correlation when the City Block metric is used.


 Evaluating Cluster Formation Getting More Information About Cluster Links