{"title":"MindGPT: Interpreting What You See With Non-Invasive Brain Recordings","authors":"Jiaxuan Chen;Yu Qi;Yueming Wang;Gang Pan","doi":"10.1109/TIP.2025.3572784","DOIUrl":null,"url":null,"abstract":"Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed MindGPT, which interprets perceived visual stimuli into natural languages from functional Magnetic Resonance Imaging (fMRI) signals in an end-to-end manner. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism. By the collaborative use of data augmentation techniques, this architecture permits us to guide latent neural representations towards a desired language semantic direction in a self-supervised fashion. Through doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The source code for the MindGPT model is publicly available at <uri>https://github.com/JxuanC/MindGPT</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3281-3293"},"PeriodicalIF":13.7000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11018227/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed MindGPT, which interprets perceived visual stimuli into natural languages from functional Magnetic Resonance Imaging (fMRI) signals in an end-to-end manner. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism. By the collaborative use of data augmentation techniques, this architecture permits us to guide latent neural representations towards a desired language semantic direction in a self-supervised fashion. Through doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The source code for the MindGPT model is publicly available at https://github.com/JxuanC/MindGPT.