Method for estimating the significance of the control parameters of projection procedures

Procedures for dimensional reduction with control parameters have been proposed and studied by many authors (see references in [18], Fukunaga, Intr. to Stat. Patt. Rec. (1990); Siedlecki, Siedlecka and Sklansky, Pattern Recognition, vol.21 (1988); Fukunaga and Mantock, IEEE Trans. on PAMI (1983); Gelsema and Eden, Pattern Recognition (1980); Huan Zhen-hua et al., Conf. on SMC (1984)). Through appropriate setting of the values of the control parameters, the projections can be adapted for various data structures. There is no well-defined relationship between these values and the class separation obtained in the projected space. This requires that a great number of trials be carried out. In order to reduce the number of trials the parameters that are less significant for class separation could be restricted to a small number of value variations.

In [18] we proposed a statistical technique for evaluating the degree of significance of control parameters of the projection procedures oriented to classifier design. It enables a strategy for the objective evaluation of the significance of control parameters, as opposed to the estimation by experience that is typically used by many authors (see references in [18], Fukunaga, (1990); Fukunaga and Mantock (1983); Gelsema and Eden (1980); Siedlecki, Siedlecka and Sklansky (1988); Huan Zhen-hua et al. (1984)). We propose to carry out the projection using available data sets. For each data set, the full combination of the prespecified values of the control parameters has been used in projection experiments, and for each projection the probability of misclassification of the projected observations has been evaluated. Suitable values of the control parameters correspond to low error rates. That is why we propose that the significance of a control parameter be evaluated in terms of the measures of association between data sets and the values of the control parameter corresponding to low error rates. The measure of association is a numerical index that describes the strength or magnitude of a relationship. A high value of this measure implies that each data set (data structure) is uniquely associated with a value of a control parameter that leads to a low projection error rate. This is the case of high significance of a control parameter. A low value of this measure implies that the class separation in the projected space is independent of the variations of a control parameter. It is known that no single measure is best in every circumstance and this is why they are used in combinations. This makes it possible to look at a relationship from several points of view, as each measure rests on a slightly different definition of association. In the paper we explained the choice of the following measures of association: Chi square test for independence, Cramer’s V, Goodman and Kruskal’s l and Goodman and Kruskal’s t .

In [19] we apply these measures of association for evaluating the significance of the control parameters of the projection procedure proposed by us in [15,16]. Three artificially constructed and two real data sets were used for evaluating the significance of the control parameters of the projection procedure. The artificial data sets were constructed using time-sampled waveforms with random parameters. The real data sets concern medical diagnosis of neurological and cardiological diseases. The experiment was performed by full combination of selected discrete values of the control parameters. A priori class information about the data sets was used to estimate the classification error of the projected samples. The leave-one-out method of error estimation, based on a nearest neighbor error counting rule was adopted. Two variants of the analysis were carried out. In the first variant all data sets were used. In the second variant the analysis was done separately for artificial and real data sets. We found that the ratings of the significance of control parameters based on the measures of association (Cramer's V, Goodman and Kruskal's l , Goodman and Kruskal's t ) were consistent in the two variants of the analysis. This allows us to rate the significant control parameters objectively. Taking into account the significance of the parameters, the range and the guidelines for the control parameter variations were found.

Department of Electrical & Computer Engineering
Prof. MAYER ALADJEM