Conferences >2022 IEEE 8th International C...

A Comparative Approach to Threshold Optimization for Classifying Imbalanced Data

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

For the practical application of a classifier, it is necessary to select an optimal output probability threshold to obtain the best classification results. There are many...Show More

Metadata

Abstract:

For the practical application of a classifier, it is necessary to select an optimal output probability threshold to obtain the best classification results. There are many criteria one may employ to select a threshold. However, selecting a threshold will often involve trading off performance in terms of one metric for performance in terms of another metric. In our literature review of studies involving selecting thresholds to optimize classification of imbalanced data, we find there is an opportunity to expand on previous work for an in-depth study of threshold selection. Our contribution is to present a systematic method for selecting the best threshold value for a given classification task and its desired performance constraints. Just as a machine learning algorithm is optimized on some training data set, we demonstrate how a user-defined set of performance metrics can be utilized to optimize the classification threshold. In this study we use four popular metrics to optimize thresholds: precision, Matthews’ Correlation Coefficient, f-measure and geometric mean of true positive rate, and true negative rate. Moreover, we compare classification results for thresholds optimized for these metrics with the commonly used default threshold of 0.5, and the prior probability of the positive class (also known as the minority to majority class ratio). Our results show that other thresholds handily outperform the default threshold of 0.5. Moreover, we show that the positive class prior probability is a good benchmark for finding classification thresholds that perform well in terms of multiple metrics.

Published in: 2022 IEEE 8th International Conference on Collaboration and Internet Computing (CIC)

Date of Conference: 14-16 December 2022

Date Added to IEEE Xplore: 13 March 2023

ISBN Information:

DOI: 10.1109/CIC56439.2022.00028

Conference Location: Atlanta, GA, USA

Contents

References is not available for this document.

A Comparative Approach to Threshold Optimization for Classifying Imbalanced Data

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Comparative Approach to Threshold Optimization for Classifying Imbalanced Data

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?