MindGPT: Interpreting What You See With Non-Invasive Brain Recordings

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-06-02 DOI:10.1109/TIP.2025.3572784

Jiaxuan Chen;Yu Qi;Yueming Wang;Gang Pan

{"title":"MindGPT: Interpreting What You See With Non-Invasive Brain Recordings","authors":"Jiaxuan Chen;Yu Qi;Yueming Wang;Gang Pan","doi":"10.1109/TIP.2025.3572784","DOIUrl":null,"url":null,"abstract":"Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed MindGPT, which interprets perceived visual stimuli into natural languages from functional Magnetic Resonance Imaging (fMRI) signals in an end-to-end manner. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism. By the collaborative use of data augmentation techniques, this architecture permits us to guide latent neural representations towards a desired language semantic direction in a self-supervised fashion. Through doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The source code for the MindGPT model is publicly available at <uri>https://github.com/JxuanC/MindGPT</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3281-3293"},"PeriodicalIF":13.7000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11018227/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed MindGPT, which interprets perceived visual stimuli into natural languages from functional Magnetic Resonance Imaging (fMRI) signals in an end-to-end manner. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism. By the collaborative use of data augmentation techniques, this architecture permits us to guide latent neural representations towards a desired language semantic direction in a self-supervised fashion. Through doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The source code for the MindGPT model is publicly available at https://github.com/JxuanC/MindGPT.

查看原文本刊更多论文

MindGPT：用非侵入性大脑记录解读你所看到的

利用无创脑记录解码视觉内容具有重要的科学和实用价值。人们努力从大脑信号中恢复看到的图像。然而，大多数现有的方法由于图像质量不足或语义不匹配而不能真实地反映视觉内容。与重建像素级视觉图像相比，说话是一种更有效的解释视觉信息的方式。在这里，我们介绍了一种非侵入性神经解码器，称为MindGPT，它将感知到的视觉刺激从功能性磁共振成像（fMRI）信号端到端解释为自然语言。具体来说，我们的模型建立在具有交叉注意机制的视觉引导神经编码器上。通过协作使用数据增强技术，该架构允许我们以自监督的方式将潜在的神经表示引导到所需的语言语义方向。通过这样做，我们发现MindGPT的神经表征是可解释的，可以用来评估视觉属性对语言语义的贡献。我们的实验表明，生成的单词序列真实地代表了视觉刺激中传达的视觉信息（包括基本细节）。结果还表明，在语言解码任务中，高级视觉皮层（HVC）比低级视觉皮层（LVC）提供更多的语义信息，仅使用高级视觉皮层（HVC）就可以恢复大部分语义信息。MindGPT模型的源代码可以在https://github.com/JxuanC/MindGPT上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量