Abstract:
Functional Magnetic Resonance Imaging (fMRI) presents challenges due to limited paired samples and low signal-to-noise ratios, particularly in tasks involving reconstruct...Show MoreMetadata
Abstract:
Functional Magnetic Resonance Imaging (fMRI) presents challenges due to limited paired samples and low signal-to-noise ratios, particularly in tasks involving reconstructing natural images or decoding their semantic content. To address these challenges, we introduce BrainCLIP, an innovative fMRI-based brain decoding model. BrainCLIP leverages Contrastive Language-Image Pre-training’s (CLIP) cross-modal generalization abilities to bridge brain activity, images, and text for the first time. Our experiments demonstrate CLIP’s effectiveness in diverse brain decoding tasks, including zero-shot visual category decoding, fMRI-image/text alignment, and fMRI-to-image generation. The core objective of BrainCLIP is to train a mapping network that translates fMRI patterns into a unified CLIP embedding space, achieved through visual and textual supervision integration. Our experiments highlight that this approach significantly enhances performance in tasks such as fMRI-text alignment and fMRI-based image generation. Notably, BrainCLIP surpasses BraVL, a recent multi-modal method, in zero-shot visual category decoding. Moreover, BrainCLIP demonstrates strong capability in reconstructing visual stimuli with high semantic fidelity, competing favorably with state-of-the-art methods in capturing high-level semantic features during fMRI-based natural image reconstruction.
Published in: IEEE Transactions on Medical Imaging ( Early Access )