Skip to Main Content
An iterative bootstrapping-based data over-sampling strategy is presented in this paper together with an adaptive neural-fuzzy inference system (ANFIS) to deal with a severely imbalanced data modelling problem. As real industrial data are often very large, containing hundreds of process variables and a huge number of data records, the selection of a compact set of input variables becomes critical for any successful modelling and analysis operations. Significant efforts have been devoted to identifying the most relevant input variables through correlation analysis and neural network based forward input selection. An optimal majority to minority class data ratio, which controls the level of data imbalance for model training, is then determined through the iterative bootstrapping process such that the combined sensitivity and specificity performance is optimised. The iterative bootstrapping ANFIS modelling strategy is then applied to a real industrial case study for rail quality classification, with the original data being provided by Tata Steel Europe. Preliminary results show a good overall performance through the iterative bootstrapping data over-sampling ANFIS modelling.