Structural risk minimization over data-dependent hierarchies
Shawe-Taylor, J.
Bartlett, P.L.
Williamson, R.C.
Anthony, M.
Dept. of Comput. Sci., London Univ.;
This paper appears in: Information Theory, IEEE Transactions on
Publication Date: Sep 1998
Volume: 44,
Issue: 5
On page(s): 1926-1940
ISSN: 0018-9448
References Cited: 50
CODEN: IETTAW
INSPEC Accession Number: 6013596
DOI: 10.1109/18.705570
Posted online: 2002-08-06 21:55:03.0
Abstract
The paper introduces some generalizations of Vapnik's (1982)
method of structural risk minimization (SRM). As well as making explicit
some of the details on SRM, it provides a result that allows one to
trade off errors on the training sample against improved generalization
performance. It then considers the more general case when the hierarchy
of classes is chosen in response to the data. A result is presented on
the generalization performance of classifiers with a “large
margin”. This theoretically explains the impressive generalization
performance of the maximal margin hyperplane algorithm of Vapnik and
co-workers (which is the basis for their support vector machines). The
paper concludes with a more general result in terms of
“luckiness” functions, which provides a quite general way
for exploiting serendipitous simplicity in observed data to obtain
better prediction accuracy from small training sets. Four examples are
given of such functions, including the Vapnik-Chervonenkis (1971)
dimension measured on the sample
Index Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.