Statistics Toolbox | ![]() ![]() |
Terminology and Basic Procedure
To perform cluster analysis on a data set using the Statistics Toolbox functions, follow this procedure:
pdist
function. The pdist
function supports many different ways to compute this measurement. See Finding the Similarities Between Objects for more information.linkage
function. The linkage
function uses the distance information generated in step 1 to determine the proximity of objects to each other. As objects are paired into binary clusters, the newly formed clusters are grouped into larger clusters until a hierarchical tree is formed. See Defining the Links Between Objects for more information.cluster
function. The cluster
function can create clusters by detecting natural groupings in the hierarchical tree or by cutting off the hierarchical tree at an arbitrary point. See Creating Clusters for more information. The following sections provide more information about each of these steps.
Note
The Statistics Toolbox includes a convenience function, clusterdata , which performs all these steps for you. You do not need to execute the pdist , linkage , or cluster functions separately. However, the clusterdata function does not give you access to the options each of the individual routines offers. For example, if you use the pdist function you can choose the distance calculation method, whereas if you use the clusterdata function you cannot.
|
![]() | Cluster Analysis | Finding the Similarities Between Objects | ![]() |