Concept-Based Lesion Aware Transformer for Interpretable Retinal Disease Diagnosis | IEEE Journals & Magazine | IEEE Xplore

Concept-Based Lesion Aware Transformer for Interpretable Retinal Disease Diagnosis


Abstract:

Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the bla...Show More

Abstract:

Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the black-box nature of these methods obscures the decision-making process, compromising their trustworthiness and acceptability. Inspired by the concept-based approaches and recognizing the intrinsic correlation between retinal lesions and diseases, we regard retinal lesions as concepts and propose an inherently interpretable framework designed to enhance both the performance and explainability of diagnostic models. Leveraging the transformer architecture, known for its proficiency in capturing long-range dependencies, our model can effectively identify lesion features. By integrating with image-level annotations, it achieves the alignment of lesion concepts with human cognition under the guidance of a retinal foundation model. Furthermore, to attain interpretability without losing lesion-specific information, our method employs a classifier built on a cross-attention mechanism for disease diagnosis and explanation, where explanations are grounded in the contributions of human-understandable lesion concepts and their visual localization. Notably, due to the structure and inherent interpretability of our model, clinicians can implement concept-level interventions to correct the diagnostic errors by simply adjusting erroneous lesion predictions. Experiments conducted on four fundus image datasets demonstrate that our method achieves favorable performance against state-of-the-art methods while providing faithful explanations and enabling concept-level interventions. Our code is publicly available at https://github.com/Sorades/CLAT.
Published in: IEEE Transactions on Medical Imaging ( Volume: 44, Issue: 1, January 2025)
Page(s): 57 - 68
Date of Publication: 16 July 2024

ISSN Information:

PubMed ID: 39012729

Funding Agency:


I. Introduction

Fundus images are indispensable in diagnosing retinal diseases, such as diabetic retinopathy (DR), age-related macular degeneration (AMD), and pathological myopia (PM). With the advancement of deep learning, deep neural networks (DNNs) have been widely applied to assist ophthalmologists in diagnosing these retinal diseases automatically by learning the discriminative features from fundus images through large-scale datasets and sophisticated architectures, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). These models have demonstrated remarkable accuracy and performance in various retinal disease diagnosis tasks [1], [2], [3], [4], [5], [6], [7], [8]. However, these DNN-based approaches are rarely applied in clinical practice, mainly due to the black-box nature of deep learning. Given that medical diagnosis demands a profound understanding of the domain knowledge, having decisions that are not comprehensible to human experts is unacceptable. This lack of transparency would undermine the trust of doctors [9]. To address this issue, various strategies [10], [11] have been proposed to help people understand the decision-making process of deep neural networks. Currently, the explanation of medical image analysis models mainly relies on saliency maps, which highlight the regions of an image that the model deems important for making predictions [12], [13]. Despite being widely used, some studies [14], [15], [16] have revealed that the saliency maps generated by these methods could be inconsistent for misclassified samples.

Contact IEEE to Subscribe

References

References is not available for this document.