Abstract:
Medical visual question answering is a prominent research area, presenting significant challenges within the domain of visual question answering. In traditional medical v...Show MoreMetadata
Abstract:
Medical visual question answering is a prominent research area, presenting significant challenges within the domain of visual question answering. In traditional medical visual question answering, the initial step commonly involves employing Convolutional Neural Network (CNN) for image information extraction. Subsequently, bilinear attention mechanisms are employed to merge textual question characteristics with image visual features. However, the method of extracting visual attributes through convolutional neural networks often overlooks the global contextual within the image, which is crucial for answering questions accurately. Consequently, this paper introduces an Additive Attention Network (AANet) to capture comprehensive image information features. Specifically, CNN is employed to obtain local visual features of images, while the additive attention mechanism serves to acquire global contextual features of images. These components complement each other, enhancing the representation of visual features and augmenting the model’s global contextual awareness capability. The proposed method demonstrated superior performance on the VQA-RAD dataset, achieving an overall accuracy of 72.5%, especially for closed questions, achieving an accuracy of 81.9%.
Published in: 2023 5th International Conference on Frontiers Technology of Information and Computer (ICFTIC)
Date of Conference: 17-19 November 2023
Date Added to IEEE Xplore: 13 March 2024
ISBN Information: