Structural risk minimization over data-dependent hierarchies
Shawe-Taylor, J.; Bartlett, P.L.; Williamson, R.C.; Anthony, M.
Information Theory, IEEE Transactions on
Volume 44, Issue 5, Sep 1998 Page(s):1926 - 1940
Digital Object Identifier 10.1109/18.705570
Summary:The paper introduces some generalizations of Vapnik's (1982)
method of structural risk minimization (SRM). As well as making explicit
some of the details on SRM, it provides a result that allows one to
trade off errors on the training sample against improved generalization
performance. It then considers the more general case when the hierarchy
of classes is chosen in response to the data. A result is presented on
the generalization performance of classifiers with a “large
margin”. This theoretically explains the impressive generalization
performance of the maximal margin hyperplane algorithm of Vapnik and
co-workers (which is the basis for their support vector machines). The
paper concludes with a more general result in terms of
“luckiness” functions, which provides a quite general way
for exploiting serendipitous simplicity in observed data to obtain
better prediction accuracy from small training sets. Four examples are
given of such functions, including the Vapnik-Chervonenkis (1971)
dimension measured on the sample
View citation and abstract |