15. M. Aladjem, (1991) "Parametric
and nonparametric linear mappings of multidimensional data", Pattern
Recognition, vol.24, No 6, 543553.
17. M. Aladjem and I. Dinstein,
(1992) "Linear mappings of local data structures", Pattern Recognition
Letters, vol. 13, 153159.

In [15] we proposed novel parametric
and nonparametric discriminant criteria for two classes based on scatter
matrices.
The parametric criterion combines the principal component
criterion and the Fisher discriminant criterion. The motivation for defining
this criterion is to maximize the distance between the classes (Fisher
criterion) along the discriminant vector, while causing one class to be
highly scattered and the other to be concentrated. We introduce usersupplied
parameters for controlling the extent to which the difference between the
withinclass scattering influen ces the solution for the discriminant vectors.
Appropriate values for the control parameters are not known in advance.
We search for them using a trial and error procedure. Our approach to model
selection is to choose the values of the control parameters that minimize
the error rates of a nearest neighbor allocation rule applied to the projections
of the training data onto the space spanned by the discriminant vectors.
We obtain the sequence of the discriminant vectors by successive optimization
of the c riterion. We use two methods for successive optimization. The
first method uses orthogonal constraints on the discriminant vectors. We
name it ORTH. The second method does not so constrain the discriminant
vectors. We call it FREE. The proposed criterion is an extension of most
of the known scatter measures of classseparation for the two classes.
They can also be obtained by introducing special values of the control
parameter.
The nonparametric criterion is an extension of our
previous proposal. Instead of the classical (parametric) scatter matrices
we introduce in the discriminat criterion the nonparametric scatter matrices
proposed by Fukunaga (Intr. to Stat. Patt. Rec., 1990). These matrices
express the local data scatter along the class separation boundary of the
classes. By this means the nonparametric criterion, as opposed to discriminant
criteria based on the classical scatter mat rices, expresses not the global
but the local structure of the data. We use some parameters which control
data localization (the width of the band along the class separation boundary
in which points with large weights in the scatter computation are locate
d). Our strategy for model selection (setting the values of the
control parameters) is similar to that used previously in the optimization
of the parametric criterion. The ORTH and FREE methods for successive optimization
could be applied to the n onparametric criterion as well. We found a more
effective method for optimization which increases the class separation
along the discriminant vectors. Its novelty lies in using different distance
measures in the setting the band along the class separatio n boundary.
For the first discriminant vector we use the Euclidean distance in the
original nspace. For the second discriminant vector we used the Euclidean
distance between the sample projections onto the first discriminant vector.
The application of th is distance leads to the assignment of the large
weights of the points which are near the class separation boundary along
the first discriminant vector. By this we separate along the second discriminant
vector the points from different classes which are m ixed or close to each
other along the first discriminant vector.
We carry out an experimental study of the new discriminant
criteria with a wide spectrum of synthetic and real data sets. The results
indicate that our criteria imply better class separation than the widely
used discrimina nt criteria based on scatter matrices (discussed by Fukunaga,
Intr. to Stat. Patt. Rec. (1990); Siedlecki, Siedlecka and Sklansky, Pattern
Recognition, vol.21 (1988)).
In cluster analysis, the visualization of the multivariate
data is carried out by projection of the data onto one, two or threedimensional
space having minimal distortion of data structure. The most popular projections
are the principal component (PC) projection and the multidimensional scaling
(MS  known in pattern recognition literature as Sammon’s mapping). The
PC projection is appropriate for simple data structures and the principle
disadvantages of the MS are its high computational complexity (see [23])
and its lack of an analytical expression that ties the original features
of the observations with the coordinates of their projections. In [17]
we proposed a projection criterion based on a kernel estimate of the g
radient of the probability density function of the multivariate data. Maximizing
this criterion we obtain a projection with the intention of compressing
the clusters. An experiment with the classical Iris data shows that the
new projection separates the c lusters of the various species better than
principal component mapping. The separation obtained was comparable with
the cluster separation of multidimensional scaling. The principal advantage
of the proposed method versus multidimensional scaling is the a nalytical
expression of the obtained data projection.
