HISTOGRAM

The HISTOGRAM function returns a longword vector equal to the density function of Array . In the simplest case, the density function, at subscript i , is the number of Array elements in the argument with a value of i .

Let F _i = the value of element i , 0 <= i < n . Let H _v = result of histogram function, a longword vector. The definition of the histogram function becomes:

HISTOGRAM can optionally return an array containing a list of the original array subscripts that contributed to each histogram bin. This list, commonly called the reverse (or backwards) index list, efficiently determines which array elements are accumulated in a set of histogram bins. A typical application of the reverse index list is reverse histogram or scatter plot interrogation--a histogram bin or 2D scatter plot location is marked with the cursor and the original data items within that bin are highlighted.

Keywords

BINSIZE

The size of the bin to use. If this keyword is not specified, a bin size of 1 is used.

INPUT

Set this keyword to a named variable that contains an array to be added to the output of HISTOGRAM. The density function of Array is added to the existing contents of INPUT and returned as the result. The array is converted to longword type if necessary and must have at least as many elements as are required to form the histogram.

Multiple histograms can be efficiently accumulated by specifying partial sums via this keyword.

MAX

MAX is the maximum value to consider. Note that the data type of the value specified for MAX should match the data type of the input array; specifying mismatched data types may produce undesired results. If this keyword is not specified, Array is searched for its largest value.

MIN

MIN is the minimum value to consider. Note that the data type of the value specified for MIN should match the data type of the input array; specifying mismatched data types may produce undesired results. If this keyword is not specified, and Array is of type byte, 0 is used. If this keyword is not specified and Array is not of byte type, Array is searched for its smallest value.

NAN

Set this keyword to cause the routine to check for occurrences of the IEEE floating-point value NaN in the input data. Elements with the value NaN are treated as missing data. (See Special Floating-Point Values for more information on IEEE floating-point values.)

OMAX

A named variable that, upon exit, contains the maximum data value used in constructing the histogram.

OMIN

A named variable that, upon exit, contains the minimum data value used in constructing the histogram.

REVERSE_INDICES

Set this keyword to a named variable in which the list of reverse indices is returned. This list is returned as a longword vector whose number of elements is the sum of the number of elements in the histogram, N , and the number of array elements included in the histogram, plus one.

The subscripts of the original array elements falling in the i th bin, 0 <= i < N , are given by the expression: R(R[i] : R(i+1)-1), where R is the reverse index list. If R[i] is equal to R[i+1], no elements are present in the i th bin.

Make the histogram of array A:

H = HISTOGRAM(A, REVERSE_INDICES = R)

IF R(i) NE R(i+1) THEN A(R(R(I) : R(i+1)-1)) = 0
; Set all elements of A that are in the ith bin of H to 0.

The above is usually more efficient than the following:

bini = WHERE(A EQ i, count)

IF count NE 0 THEN A(bini) = 0

Examples

Create a simple, two-dimensional dataset with the DIST function by entering:

D = DIST(200)

Plot the histogram of D with a bin size of 1 and the default minimum and maximum by entering:

PLOT, HISTOGRAM(D)

To plot a histogram considering only those values from 10 to 50 using a bin size of 4, enter:

PLOT, HISTOGRAM(D, MIN = 10, MAX = 50, BINSIZE = 4)

The HISTOGRAM function can also be used to increment the elements of one vector whose subscripts are contained in another vector. To increment those elements of vector A indicated by vector B, use the command:

A = HISTOGRAM(B, INPUT=A, MIN=0, MAX=N_ELEMENTS(A)-1)

This method works for duplicate subscripts, whereas the statement:

A[B] = A[B]+1

never adds more than 1 to any element, even if that element is duplicated in vector B. For example, the following commands:

A = LONARR(5)

B = [2,2,3]

PRINT, HISTOGRAM(B, INPUT=A, MIN=0, MAX=4)

print:

0 0 2 1 0

while the commands:

A = LONARR(5)

A[B] = A[B]+1

PRINT, A

gives the result:

0 0 1 1 0

The following example demonstrates how to use HISTOGRAM.

PRO t_histogram

data = [[-5, 4, 2, -8, 1], $

[ 3, 0, 5, -5, 1], $

[ 6, -7, 4, -4, -8], $

[-1, -5, -14, 2, 1]]

hist = HISTOGRAM(data)

bins = FINDGEN(N_ELEMENTS(hist)) - ABS(MIN(data))

PRINT, MIN(hist)

PRINT, bins

PLOT, bins, hist, YRANGE = [MIN(hist)-1, MAX(hist)+1], PSYM = 10, $

XTITLE = 'Bin Number', YTITLE = 'Density per Bin'

END

IDL prints:

-14.0000 -13.0000 -12.0000 -11.0000 -10.0000 -9.00000

-8.00000 -7.00000 -6.00000 -5.00000 -4.00000 -3.00000

-2.00000 -1.00000 0.00000 1.00000 2.00000 3.00000

4.00000 5.00000 6.00000

HISTOGRAM

Calling Sequence

Arguments

Array