21. M. Aladjem, (1997) "Linear Discriminant analysis
for two classes via removal of classification structure", IEEE Transaction
on Pattern Analysis and Machine Intelligence , vol.19, No 2, pp.187191.
22. M. Aladjem, (1998) "Nonparametric discriminant
analysis via recursive optimization of PatrickFisher distance", IEEE
Trans. on Syst. ,Man, Cybern, vol. 28B, No 2, pp. 292299.

In [21] we proposed a new method for identifying two
discriminant directions obtained by successive optimization of discriminant
criteria which determine a single linear onedimensional subspace of the
sample space. We discuss two classes of discriminant criteria,
namely, the extended Fisher criterion previously proposed by us [15,16]
and the nonparametric criterion proposed by Fukunaga (Intr. to Stat. Patt.
Rec. 1990). In the past two methods for successive optimization were applied.
The first method uses orthogonal constraints on the discriminant vectors
(called method ORTH). The second method does not so constrain the discriminant
vectors (called method FREE). Our experience shows that method FREE does
not always create new discriminant information along the second discriminant
vector. In order to include new information in it, method ORTH was used,
but it is known that orthogonal directions do not always suffice; directions
containing information for class separation may be oblique to each other.
We proposed a new method which is better than FREE and ORTH. It is free
to search for discriminant directions oblique to each other and ensures
that the informative directions already found will not be chosen again
at a later stage.
The new method, called REM, "removal of classification
structure", consists of obtaining a discriminant vector, transforming
the data along it into data without classification structure, and obtaining
the second discriminant vector. The proposed method is stimulated by an
idea of Friedman (“Exploratory projection pursuit”, JASA, vol.82, 1987)
called "structure removal" which is oriented to cluster analysis.
The aim of this work was to employ "structure removal" in discriminant
analysis.
We compared the discrimination qualities of the plots spanned
by the discriminant vectors obtained by the methods ORTH, FREE and the
new method REM. We study a wide spectrum of situations in terms of the
size of the design sample drawn from synthetic and real data. Synthetic
data sets concern normal classconditional distributions, nonnormal unimodal
distributions and multimodal distributions. The real data concerns the
medical diagnosis of the neurological disease cerebrovascular accident
(CVA). We compared the discrimination qualities of the plots by
the resulting differences of the test error rates. The simulation
studies and the real data experiment indicate that the proposed method
REM increases the quality of the discriminant plots obtained by optimization
of two criteria, namely our extended Fisher criterion [15,16] and the nonparametric
criterion proposed by Fukunaga (1990). There appears to be no loss in applying
REM instead of both ORTH and FREE. It seems that the new method REM provides
a more effective constraint on the discriminant vectors than does the orthogonality
constraint usually applied in statistical pattern recognition.
In [22] we proposed a new method for the optimization of a discriminant criterion which has several local maxima. In our previous work [20] we studied discriminant criteria which have unique maxima with respect to the directional vector, the optimal solution being obtained analytically (by solving some eigenvalue problems). Here we discussed the optimization of a criterion, called the PatrickFisher (PF) distance, which is a highly nonlinear function and has several local maxima. For this criterion an iterative optimization has to be carried out. In most studies in the past, the vector which maximizes PF distance is searched for along the gradient of this distance hoping that with a good starting point the procedure will converge to the global maximum or at least to a practical one. In the naive optimization, the observed maximum of the PF distance can be merely a local maximum. We proposed a method of recursive optimization for searching the directions corresponding to several large local maxima of the PF distance. Starting from some directional vector we search to a convergence point of a local maximizer of the PF distance. The directional vector after convergence of the maximizer is called the PF discriminant vector. Then we transform the data along the obtained PF vector into data with a deflated maximum of PF distance (greater overlap of the classes) and iterate to obtain the next PF discriminant vector. In order to deflate the maximum of the PF distance one could use our method REM (proposed in [20]) which will result in full overlap of the classes along the previously defined discriminant vector. This certainly eliminates the local maximum of the PF distance, but it causes large changes of the distributions of the transformed data in some applications discussed in our paper. In order to overcome this problem we developed a method for "reduction of the class separation" which directs the local optimizer to a new local maximum while keeping the data structure of the transformed data as close as possible to the original one. We carried out experiments with synthetic and real data sets which show that our method succeeded in finding the sequence of directions with significant discriminant information. Our method was more effective than the methods for initialization of the local optimizer of the PF distance used in the past (initializations by the Fisher’s discriminant analysis and principal component analysis). Our method, as is the case with the stochastic smoothing algorithms, may miss some large local maxima including the global one. We view this as desirable and do not think of our proposal as a global optimization method, but rather as a simple tool which detects directions with “interesting” discriminant information.
