18. M. Aladjem, (1993) " A statistical
technique for evaluating the significance of control parameters of mapping
procedures", Pattern Recognition Letters, vol. 14, No 8, 631-636.
19. M. Aladjem, ( 1993) " Significance
of control parameters of a nonparametric linear mapping procedure", Pattern
Recognition Letters, vol. 14, No 8, 637-645
|
Procedures for dimensional reduction with control parameters
have been proposed and studied by many authors (see references in [18],
Fukunaga, Intr. to Stat. Patt. Rec. (1990); Siedlecki, Siedlecka and Sklansky,
Pattern Recognition, vol.21 (1988); Fukunaga and Mantock, IEEE Trans. on
PAMI (1983); Gelsema and Eden, Pattern Recognition (1980); Huan Zhen-hua
et al., Conf. on SMC (1984)). Through appropriate setting of the values
of the control parameters, the projections can be adapted for various data
structures. There is no well-defined relationship between these values
and the class separation obtained in the projected space. This requires
that a great number of trials be carried out. In order to reduce the number
of trials the parameters that are less significant for class separation
could be restricted to a small number of value variations.
In [18] we proposed a statistical
technique for evaluating the degree of significance of control parameters
of the projection procedures oriented to classifier design. It
enables a strategy for the objective evaluation of the significance of
control parameters, as opposed to the estimation by experience that is
typically used by many authors (see references in [18], Fukunaga, (1990);
Fukunaga and Mantock (1983); Gelsema and Eden (1980); Siedlecki, Siedlecka
and Sklansky (1988); Huan Zhen-hua et al. (1984)). We propose to carry
out the projection using available data sets. For each data set, the full
combination of the prespecified values of the control parameters has been
used in projection experiments, and for each projection the probability
of misclassification of the projected observations has been evaluated.
Suitable values of the control parameters correspond to low error rates.
That is why we propose that the significance of a control parameter be
evaluated in terms of the measures of association between data sets and
the values of the control parameter corresponding to low error rates. The
measure of association is a numerical index that describes the strength
or magnitude of a relationship. A high value of this measure implies that
each data set (data structure) is uniquely associated with a value of a
control parameter that leads to a low projection error rate. This is the
case of high significance of a control parameter. A low value of this measure
implies that the class separation in the projected space is independent
of the variations of a control parameter. It is known that no single measure
is best in every circumstance and this is why they are used in combinations.
This makes it possible to look at a relationship from several points of
view, as each measure rests on a slightly different definition of association.
In the paper we explained the choice of the following measures of association:
Chi square test for independence, Cramer’s V, Goodman and Kruskal’s l
and Goodman and Kruskal’s t .
In [19] we apply these measures of association for evaluating the significance of the control parameters of the projection procedure proposed by us in [15,16]. Three artificially constructed and two real data sets were used for evaluating the significance of the control parameters of the projection procedure. The artificial data sets were constructed using time-sampled waveforms with random parameters. The real data sets concern medical diagnosis of neurological and cardiological diseases. The experiment was performed by full combination of selected discrete values of the control parameters. A priori class information about the data sets was used to estimate the classification error of the projected samples. The leave-one-out method of error estimation, based on a nearest neighbor error counting rule was adopted. Two variants of the analysis were carried out. In the first variant all data sets were used. In the second variant the analysis was done separately for artificial and real data sets. We found that the ratings of the significance of control parameters based on the measures of association (Cramer's V, Goodman and Kruskal's l , Goodman and Kruskal's t ) were consistent in the two variants of the analysis. This allows us to rate the significant control parameters objectively. Taking into account the significance of the parameters, the range and the guidelines for the control parameter variations were found.
|