I. Introduction
Feature selection, which aims to find a subset of the most relevant and informative features for predicting the class labels from the original feature set, is a critical step in data preprocessing for many classification tasks [1], [2], [3], [4]. When these relevant and informative features are used as inputs for classification, the performance of the learned models is expected to be improved. Meanwhile, feature selection reduces the dimensionality of datasets by removing redundant and irrelevant features, which could speed up the learning process and avoid overfitting [5]. Feature selection presents a dual objective optimization problem, aiming to concurrently minimize the count of selected features and the classification error rate. Balancing these two objectives becomes a nuanced challenge in practice.