Conferences >2018 Prognostics and System H...

Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be ...Show More

Metadata

Abstract:

Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be used to infer knowledge about new data in the future. Classification, an instance of supervised learning, is statistically defined as the problem of identifying to which particular subpopulation a new observation of data belongs. Given the large number of available classification algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the best method to analyze a new data set has become an ever more challenging task. Typically, finding the best classifier for a given data set involves empirically iterating through all candidate classifiers and choosing the one which provides the best classification accuracies. Clearly, this task is computationally very expensive and the computation cost increases with the addition of each new candidate algorithm. This problem is compounded by the fact that there is not adequate generalizability of these classification methods over data sets from different domains. For example, classifier performance obtained with medical imaging data may not hold for financial data when trying to achieve a similar classification task using the same classifier. How, then, does one efficiently choose a classifier which will provide better classification performance than others? In this context, this study aims to streamline the task of algorithm selection for classification using the meta-learning framework. We propose a methodology to analyze empirically a set of measures of data complexity, known as metafeatures, and investigate their influence on the classification performance of several widely used classifiers. Doing so allows a map of a performance metric to be generated over the metafeature space of data sets. This map is partitioned into regions where some classifiers perform better than others. Once implemented, a new data set can be located in...

Published in: 2018 Prognostics and System Health Management Conference (PHM-Chongqing)

Date of Conference: 26-28 October 2018

Date Added to IEEE Xplore: 06 January 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/PHM-Chongqing.2018.00224

Conference Location: Chongqing, China

Contents

References is not available for this document.

Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?