Statistics Toolbox    

Terminology and Basic Procedure

To perform cluster analysis on a data set using the Statistics Toolbox functions, follow this procedure:

  1. Find the similarity or dissimilarity between every pair of objects in the data set. In this step, you calculate the distance between objects using the pdist function. The pdist function supports many different ways to compute this measurement. See Finding the Similarities Between Objects for more information.
  2. Group the objects into a binary, hierarchical cluster tree. In this step, you link together pairs of objects that are in close proximity using the linkage function. The linkage function uses the distance information generated in step 1 to determine the proximity of objects to each other. As objects are paired into binary clusters, the newly formed clusters are grouped into larger clusters until a hierarchical tree is formed. See Defining the Links Between Objects for more information.
  3. Determine where to divide the hierarchical tree into clusters. In this step, you divide the objects in the hierarchical tree into clusters using the cluster function. The cluster function can create clusters by detecting natural groupings in the hierarchical tree or by cutting off the hierarchical tree at an arbitrary point. See Creating Clusters for more information.

The following sections provide more information about each of these steps.


 Cluster Analysis Finding the Similarities Between Objects