Abstract:
Semantic information is crucial for human awareness. The ability to extract such information interactively from brain activity using non-invasive technologies like functi...Show MoreMetadata
Abstract:
Semantic information is crucial for human awareness. The ability to extract such information interactively from brain activity using non-invasive technologies like functional Magnetic Resonance Imaging (fMRI) is valuable for medical assistive technologies. However, research in this domain remains relatively limited. To address this gap, we proposes BrainChat, an interactive framework designed to decode semantic information from fMRI. BrainChat leverages a large-scale vision-language model, functioning through fMRI-based captioning and, optionally, question answering. First, a pair of fMRI encoder and decoder is trained to map fMRI data into a latent space representation using Masked Brain Modeling, a self-supervised approach. On the second stage, a projector is added to align these fMRI representations with both pretrained image and text embeddings, yielding a unified representation. A text decoder is also added at this stage, adopting cross-attention with the unified fMRI representation to guide the generation of semantic information. During this stage, the fMRI encoder, the projector, and the text decoder are trained together by minimizing a combined contrastive loss and caption loss. BrainChat achieves state-of-the-art performance in fMRI captioning and implements fMRI question answering, enabling interactive clinical applications. The code is available on Github 1.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: