Skip to Main Content
With the advent of high-throughput genomic and proteomic technologies, in conjunction with the difficulty in obtaining even moderately sized samples, small-sample classifier design has become a major issue in the biological and medical communities. With small samples, training-data error estimation becomes mandatory. Yet none of the popular error estimation techniques have been rigorously designed based on statistical inference and optimization. In this investigation, we place classifier error estimation into the framework of optimal mean-square error (MSE) signal estimation in the presence of uncertainty, which results in a Bayesian approach to error estimation based on a parameterized family of feature-label distributions with the prior distribution of the parameters governing the choice of feature-label distribution. These Bayesian error estimators are optimal when averaged over a given family of distributions, unbiased when averaged over a given family and all samples, and analytically address a trade-off between robustness (modeling assumptions) and accuracy (minimum mean-square error). In this paper, Part I of a two-part study, we define the minimum mean-square error (MMSE) error estimator, discuss its basic properties, provide closed-form analytic estimator representation for discrete classifiers with both non-informative and informative prior distributions, and examine the performance and robustness of the MMSE error estimator via simulations. In Part II of this paper, in this same issue of IEEE Transactions on Signal Processing, we address all of these issues, in particular, closed-form representation for linear classification in the Gaussian model with known and unknown covariance matrices. For both the discrete and Gaussian cases, the MMSE error estimator has especially good performance for distributions having moderate true errors.
Date of Publication: Jan. 2011