15. M. Aladjem, (1991) "Parametric and nonparametric linear mappings of multidimensional data", Pattern Recognition, vol.24, No 6, 543-553.
17. M. Aladjem and I. Dinstein, (1992) "Linear mappings of local data structures", Pattern Recognition Letters, vol. 13, 153-159.
In  we proposed novel parametric and non-parametric discriminant criteria for two classes based on scatter matrices.
The parametric criterion combines the principal component criterion and the Fisher discriminant criterion. The motivation for defining this criterion is to maximize the distance between the classes (Fisher criterion) along the discriminant vector, while causing one class to be highly scattered and the other to be concentrated. We introduce user-supplied parameters for controlling the extent to which the difference between the within-class scattering influen ces the solution for the discriminant vectors. Appropriate values for the control parameters are not known in advance. We search for them using a trial and error procedure. Our approach to model selection is to choose the values of the control parameters that minimize the error rates of a nearest neighbor allocation rule applied to the projections of the training data onto the space spanned by the discriminant vectors. We obtain the sequence of the discriminant vectors by successive optimization of the c riterion. We use two methods for successive optimization. The first method uses orthogonal constraints on the discriminant vectors. We name it ORTH. The second method does not so constrain the discriminant vectors. We call it FREE. The proposed criterion is an extension of most of the known scatter measures of class-separation for the two classes. They can also be obtained by introducing special values of the control parameter.
The non-parametric criterion is an extension of our previous proposal. Instead of the classical (parametric) scatter matrices we introduce in the discriminat criterion the nonparametric scatter matrices proposed by Fukunaga (Intr. to Stat. Patt. Rec., 1990). These matrices express the local data scatter along the class separation boundary of the classes. By this means the nonparametric criterion, as opposed to discriminant criteria based on the classical scatter mat rices, expresses not the global but the local structure of the data. We use some parameters which control data localization (the width of the band along the class separation boundary in which points with large weights in the scatter computation are locate d). Our strategy for model selection (setting the values of the control parameters) is similar to that used previously in the optimization of the parametric criterion. The ORTH and FREE methods for successive optimization could be applied to the n onparametric criterion as well. We found a more effective method for optimization which increases the class separation along the discriminant vectors. Its novelty lies in using different distance measures in the setting the band along the class separatio n boundary. For the first discriminant vector we use the Euclidean distance in the original n-space. For the second discriminant vector we used the Euclidean distance between the sample projections onto the first discriminant vector. The application of th is distance leads to the assignment of the large weights of the points which are near the class separation boundary along the first discriminant vector. By this we separate along the second discriminant vector the points from different classes which are m ixed or close to each other along the first discriminant vector.
We carry out an experimental study of the new discriminant
criteria with a wide spectrum of synthetic and real data sets. The results
indicate that our criteria imply better class separation than the widely
used discrimina nt criteria based on scatter matrices (discussed by Fukunaga,
Intr. to Stat. Patt. Rec. (1990); Siedlecki, Siedlecka and Sklansky, Pattern
Recognition, vol.21 (1988)).
In cluster analysis, the visualization of the multivariate data is carried out by projection of the data onto one-, two- or three-dimensional space having minimal distortion of data structure. The most popular projections are the principal component (PC) projection and the multidimensional scaling (MS - known in pattern recognition literature as Sammonís mapping). The PC projection is appropriate for simple data structures and the principle disadvantages of the MS are its high computational complexity (see ) and its lack of an analytical expression that ties the original features of the observations with the coordinates of their projections. In  we proposed a projection criterion based on a kernel estimate of the g radient of the probability density function of the multivariate data. Maximizing this criterion we obtain a projection with the intention of compressing the clusters. An experiment with the classical Iris data shows that the new projection separates the c lusters of the various species better than principal component mapping. The separation obtained was comparable with the cluster separation of multidimensional scaling. The principal advantage of the proposed method versus multidimensional scaling is the a nalytical expression of the obtained data projection.