Abstract:
Feature selection is a crucial step for data preprocessing, and it is widely applied in machine learning. It can eliminate features that are redundant or irrelevant from ...Show MoreMetadata
Abstract:
Feature selection is a crucial step for data preprocessing, and it is widely applied in machine learning. It can eliminate features that are redundant or irrelevant from data, thereby improving performance and reducing runtime. The uncertain nature of labels produces unique challenges for high-dimensional data with label ambiguity, which is still an open problem; the structural information of the data is not utilized fully. In this article, we sufficiently consider the structural information of the data, including relevancy between labels and features, redundancy among features, and positive regions, and set up a novel label ambiguity feature selection model via weighted label-fuzzy relevancy and redundancy. Specifically, we first transform the nonlabel distribution annotations to label distribution annotations by using a label enhancement model. Second, we use a fuzzy similarity relation to quantify how similar samples are in label space. Third, a general label-fuzzy rough set model is created, and then, a novel feature evaluation measure based on weighted label-fuzzy relevancy and redundancy is defined. In this model, general label-fuzzy rough sets are employed to process label ambiguity problems, and the label-fuzzy relevancy and redundancy are weighted with the feature significance with the positive region as the focus. Finally, a feature selection algorithm for label ambiguity that follows the idea of weighted label-fuzzy relevancy and redundancy is proposed. Extensive experiments are conducted on 12 label distribution annotation datasets and eight multilabel annotation datasets. The results indicate the advantages of our proposed algorithm over state-of-the-art algorithms.
Published in: IEEE Transactions on Fuzzy Systems ( Volume: 32, Issue: 8, August 2024)