Conferences >2023 20th International Joint...

Improving Pre-Trained Models for Multi-Label Classification in Stack Overflow: A Comparison of Imbalanced Data Handling Methods

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Tag classification is essential in Stack Overflow. Instead of combining through pages or replies of irrelevant information, users can easily and quickly pinpoint relevant...Show More

Metadata

Abstract:

Tag classification is essential in Stack Overflow. Instead of combining through pages or replies of irrelevant information, users can easily and quickly pinpoint relevant posts and answers using tags. Since User-submitted posts can have multiple tags, classifying tags in Stack Overflow can be challenging. This results in an imbalance problem between labels in the whole labelset. Pretrained deep-learning models with small datasets can improve tag classification accuracy. Common multi-label resampling techniques with machine learning classifiers can also fix this issue. Still, few studies have explored which resampling technique can improve the performance of pre-trained deep models for predicting tags. To address this gap, we experimented to evaluate the effectiveness of ELECTRA, a powerful deep learning pre-trained model, with various multi-label resampling techniques in decreasing the imbalance that induces mislabeling in Stack Overflow's tagging posts. We compared seven resampling techniques, such as LP-ROS, ML-ROS, MLSMOTE, MLeNN, MLTL, ML-SOL, and REMEDIAL, to find the best method to mitigate the imbalance and improve tag prediction accuracy. Our results show that MLTL is the most effective selection to tackle the inequality in multi-label classification for our Stack Overflow data with deep learning scenarios. MLTL achieved 0.517, 0.804, 0.467, and 0.98 from the metrics Precision@l, Recall@5, F1-score@1, and AUC, respectively. Conversely, MLeNN gained only 0.323, 0.648, 0.277, and 0.95 from the same metrics.

Published in: 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Date of Conference: 28 June 2023 - 01 July 2023

Date Added to IEEE Xplore: 10 August 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/JCSSE58229.2023.10202012

Conference Location: Phitsanulok, Thailand

Funding Agency:

Contents

References is not available for this document.

Improving Pre-Trained Models for Multi-Label Classification in Stack Overflow: A Comparison of Imbalanced Data Handling Methods

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Improving Pre-Trained Models for Multi-Label Classification in Stack Overflow: A Comparison of Imbalanced Data Handling Methods

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?