Loading [MathJax]/extensions/MathMenu.js
An Unbalanced Data Hybrid-Sampling Algorithm Based on Multi-Information Fusion | IEEE Conference Publication | IEEE Xplore

An Unbalanced Data Hybrid-Sampling Algorithm Based on Multi-Information Fusion


Abstract:

The emergence of big data bringsnewissues and challenges for the data imbalance problem.Therefore, unbalanced data sampling technology has been a hot research topic in th...Show More

Abstract:

The emergence of big data bringsnewissues and challenges for the data imbalance problem.Therefore, unbalanced data sampling technology has been a hot research topic in the field of big data.However, the existing sampling methods cannot accurately define the harmful and useless samplescontained in the originaldataset. That is, based on the single information of the dataset, a large number of actuallyharmful samples are being used for sampling, which results in a sharp decline in the identifiable performance of the sampled data. In order to overcome the problems caused by only using one kind of information, an unbalanced data hybrid-sampling algorithm based on multi-information fusion(MIFS)is presented in this paper. The MIFS combines the feature information learned by the boostingmodel with the position information of the data to define the sample, and then divides the samples into different subsets by the information contained. According to the definition of samples, the algorithm performs corresponding under-sampling and over-sampling on these subsets. Experiments show that the MIFS method can improve the performance of sampling operations and produce a high F-score and AUC against bothminority and majority classes in the classification of balanced data.
Date of Conference: 04-08 December 2017
Date Added to IEEE Xplore: 15 January 2018
ISBN Information:
Conference Location: Singapore

Contact IEEE to Subscribe

References

References is not available for this document.