Loading [MathJax]/extensions/MathMenu.js
Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection | IEEE Conference Publication | IEEE Xplore

Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection


Abstract:

Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be ...Show More

Abstract:

Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be used to infer knowledge about new data in the future. Classification, an instance of supervised learning, is statistically defined as the problem of identifying to which particular subpopulation a new observation of data belongs. Given the large number of available classification algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the best method to analyze a new data set has become an ever more challenging task. Typically, finding the best classifier for a given data set involves empirically iterating through all candidate classifiers and choosing the one which provides the best classification accuracies. Clearly, this task is computationally very expensive and the computation cost increases with the addition of each new candidate algorithm. This problem is compounded by the fact that there is not adequate generalizability of these classification methods over data sets from different domains. For example, classifier performance obtained with medical imaging data may not hold for financial data when trying to achieve a similar classification task using the same classifier. How, then, does one efficiently choose a classifier which will provide better classification performance than others? In this context, this study aims to streamline the task of algorithm selection for classification using the meta-learning framework. We propose a methodology to analyze empirically a set of measures of data complexity, known as metafeatures, and investigate their influence on the classification performance of several widely used classifiers. Doing so allows a map of a performance metric to be generated over the metafeature space of data sets. This map is partitioned into regions where some classifiers perform better than others. Once implemented, a new data set can be located in...
Date of Conference: 26-28 October 2018
Date Added to IEEE Xplore: 06 January 2019
ISBN Information:

ISSN Information:

Conference Location: Chongqing, China

Contact IEEE to Subscribe

References

References is not available for this document.