CLUSTER

The CLUSTER function computes the classification of an m -column, n -row array, where m is the number of variables and n is the number of observations or samples. The classification is based upon a cluster analysis of sample-based distances. The result is a 1-column, n -row array of cluster number assignments that correspond to each sample.

Calling Sequence

Result = CLUSTER( Array, Weights )

Arguments

Array

An M-column, N-row array of type float or double.

Weights

An array of weights (the cluster centers) computed using the CLUST_WTS function. The dimensions of this array vary according to keyword values.

Keywords

DOUBLE

Set this keyword to force the computation to be done in double-precision arithmetic.

N_CLUSTERS

Set this keyword equal to the number of clusters. The default is based upon the row dimension of the Weights array.

Example

Define an array with 4 variables and 10 observations.

array = $

[[ 1.5, 43.1, 29.1, 1.9], $

 [24.7, 49.8, 28.2, 22.8], $

 [30.7, 51.9, 7.0, 18.7], $

 [ 9.8, 4.3, 31.1, 0.1], $

 [19.1, 42.2, 0.9, 12.9], $

 [25.6, 13.9, 3.7, 21.7], $

 [ 1.4, 58.5, 27.6, 7.1], $

 [ 7.9, 2.1, 30.6, 5.4], $

 [22.1, 49.9, 3.2, 21.3], $

 [ 5.5, 53.5, 4.8, 19.3]]

Compute the cluster weights, using two distinct clusters.

weights = CLUST_WTS(array, N_CLUSTERS=2)

Compute the classification of each sample.

result = CLUSTER(array, weights, N_CLUSTERS=2)

Print each sample (each row) of the array and its corresponding cluster assignment.

FOR k = 0, N_ELEMENTS(result)-1 DO PRINT, $

array[*,k], result(k), FORMAT = '(4(f4.1, 2x), 5x, i1)'

IDL prints:

 1.5 43.1 29.1 1.9 1

24.7 49.8 28.2 22.8 0

30.7 51.9 7.0 18.7 0

9.8 4.3 31.1 0.1 1

19.1 42.2 0.9 12.9 0

25.6 13.9 3.7 21.7 0

1.4 58.5 27.6 7.1 1

7.9 2.1 30.6 5.4 1

22.1 49.9 3.2 21.3 0

5.5 53.5 4.8 19.3 0

See Also

CLUST_WTS , PCOMP , STANDARDIZE , Multivariate Analysis .