Abstract:
The emergence of big data bringsnewissues and challenges for the data imbalance problem.Therefore, unbalanced data sampling technology has been a hot research topic in th...Show MoreMetadata
Abstract:
The emergence of big data bringsnewissues and challenges for the data imbalance problem.Therefore, unbalanced data sampling technology has been a hot research topic in the field of big data.However, the existing sampling methods cannot accurately define the harmful and useless samplescontained in the originaldataset. That is, based on the single information of the dataset, a large number of actuallyharmful samples are being used for sampling, which results in a sharp decline in the identifiable performance of the sampled data. In order to overcome the problems caused by only using one kind of information, an unbalanced data hybrid-sampling algorithm based on multi-information fusion(MIFS)is presented in this paper. The MIFS combines the feature information learned by the boostingmodel with the position information of the data to define the sample, and then divides the samples into different subsets by the information contained. According to the definition of samples, the algorithm performs corresponding under-sampling and over-sampling on these subsets. Experiments show that the MIFS method can improve the performance of sampling operations and produce a high F-score and AUC against bothminority and majority classes in the classification of balanced data.
Date of Conference: 04-08 December 2017
Date Added to IEEE Xplore: 15 January 2018
ISBN Information: