Skip to Main Content
Supervised learning is well-known and widely applied in many domains including bioinformatics, cheminformatics and financial forecasting. However, the interference from irrelevant features may lead to the poor accuracy of classifiers. As a popular feature selection model, GA-SVM is desirable in many of those cases to filter out irrelevant features and improve the learning performance subsequently. However, the high computational cost strongly discourages the application of GA-SVM in large-scale datasets. In this paper, an HPC-enabled GA-SVM (HGA-SVM) is proposed by integrating data parallelization, multithreading and heuristic techniques with the ultimate goal of robustness and low computational cost. Our proposed model is comprised of four improvement strategies: 1) GA parallelization, 2) SVM parallelization, 3) neighbor search and 4) evaluation caching. All the four strategies improve various aspects of the feature selection model and contribute collectively towards higher computational throughput.