Rethinking Deep CNN Training: A Novel Approach for Quality-Aware Dataset Optimization | IEEE Journals & Magazine | IEEE Xplore

Rethinking Deep CNN Training: A Novel Approach for Quality-Aware Dataset Optimization


Overall workflow of the proposed quality-aware dataset optimization method

Abstract:

The informativeness of data has always been of great interest within the machine learning community. Nowadays, with the skyrocketing advancement of artificial intelligenc...Show More

Abstract:

The informativeness of data has always been of great interest within the machine learning community. Nowadays, with the skyrocketing advancement of artificial intelligence and massive volumes of noisy data, it becomes even more essential to develop robust and effective methods for training data optimization. Existing approaches are mostly based on empirical trial and error, with either stochastic or deterministic data reduction strategies. The key limitation of such solutions is that they do not consider the overall informativeness of the resulting training dataset. In this paper, a novel approach for quality-aware dataset optimization by initial assessment of its informativeness is proposed. As a metric of informativeness, entropy values are calculated over the target dataset. To alleviate the computational complexity, an initial clustering of the dataset is performed, and the entropy of each cluster is calculated independently. The dataset is then optimized by dynamic programming to find a sequence of subsets with low overall entropy according to imposed size limitations. The experimental evaluation shows that the proposed approach improves over current best alternatives in terms of accuracy, precision, recall, and F1-score metrics. Moreover, the proposed approach provides excellent interclass discrimination even for a large number of classes.
Overall workflow of the proposed quality-aware dataset optimization method
Published in: IEEE Access ( Volume: 12)
Page(s): 137427 - 137438
Date of Publication: 14 June 2024
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.