Abstract:
Data imbalance in Machine Learning refers to an unequal distribution of classes within a dataset. This issue is encountered mostly in classification tasks in which the di...Show MoreMetadata
Abstract:
Data imbalance in Machine Learning refers to an unequal distribution of classes within a dataset. This issue is encountered mostly in classification tasks in which the distribution of classes or labels in a given dataset is not uniform. The straightforward method to solve this problem is the resampling method by adding records to the minority class or deleting ones from the majority class. In this paper, we have experimented with the two resampling widely adopted techniques: oversampling and undersampling. In order to explore both techniques, we have chosen a public imbalanced dataset from kaggle website Santander Customer Transaction Prediction and have applied a group of well-known machine learning algorithms with different hyperparamters that give best results for both resampling techniques. One of the key findings of this paper is noticing that oversampling performs better than undersampling for different classifiers and obtains higher scores in different evaluation metrics.
Date of Conference: 07-09 April 2020
Date Added to IEEE Xplore: 27 April 2020
ISBN Information:
ISSN Information:
Comparison between Support Vector Machine and Random Forest for Hepatocellular Carcinoma (HCC) Classification
Velery Virgina Putri Wibowo,Zuherman Rustam,Sri Hartini,Qisthina Syifa Setiawan,Jane Eva Aurelia
Use of Support Vector Machine, decision tree and Naive Bayesian techniques for wind speed classification
Patil SangitaB,Surekha. R. Deshmukh
Region-Kernel-Based Support Vector Machines for Hyperspectral Image Classification
Jiangtao Peng,Yicong Zhou,C. L. Philip Chen
Online Signature Verification With Support Vector Machines Based on LCSS Kernel Functions
Christian Gruber,Thiemo Gruber,Sebastian Krinninger,Bernhard Sick
Time Series Classification Using Support Vector Machine with Gaussian Elastic Metric Kernel
Dongyu Zhang,Wangmeng Zuo,David Zhang,Hongzhi Zhang
Multi-class classification using support vector machines in decision tree architecture
Gjorgji Madzarov,Dejan Gjorgjevikj
Support vector machine based decision tree for very high resolution multispectral forest mapping
Petra Krahwinkler,Juergen Rossmann,Bjoern Sondermann
Support Vector Machine Algorithm Based on Kernel Hierarchical Clustering for Multiclass Classification
Huaitie Xiao,Fasheng Sun,Yongsheng Liang
The study on Gauss kernel function in Support Vector Machine
Wan Fuyong,Zhao Ying
Design and implementation of insulators material hydrophobicity measure system by support vector machine decision tree learning
Quan-De Wang,Zhi-Feng Zhong,Xian-Pei Wang