Spurious reconstruction from brain activity

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-05-27 DOI:10.1016/j.neunet.2025.107515

Ken Shirakawa , Yoshihiro Nagano , Misato Tanaka , Shuntaro C. Aoki , Yusuke Muraki , Kei Majima , Yukiyasu Kamitani

{"title":"Spurious reconstruction from brain activity","authors":"Ken Shirakawa , Yoshihiro Nagano , Misato Tanaka , Shuntaro C. Aoki , Yusuke Muraki , Kei Majima , Yukiyasu Kamitani","doi":"10.1016/j.neunet.2025.107515","DOIUrl":null,"url":null,"abstract":"<div><div>Advances in brain decoding, particularly in visual image reconstruction, have sparked discussions about the societal implications and ethical considerations of neurotechnology. As reconstruction methods aim to recover visual experiences from brain activity and achieve prediction beyond training samples (zero-shot prediction), it is crucial to assess their capabilities and limitations to inform public expectations and regulations. Our case study of recent text-guided reconstruction methods, which leverage a large-scale dataset (Natural Scenes Dataset, NSD) and text-to-image diffusion models, reveals critical limitations in their generalizability, demonstrated by poor reconstructions on a different dataset. UMAP visualization of the text features from NSD images shows limited diversity with overlapping semantic and visual clusters between training and test sets. We identify that clustered training samples can lead to “output dimension collapse,” restricting predictable output feature dimensions. While diverse training data improves generalization over the entire feature space without requiring exponential scaling, text features alone prove insufficient for mapping to the visual space. Our findings suggest that the apparent realism in current text-guided reconstructions stems from a combination of classification into trained categories and inauthentic image generation (hallucination) through diffusion models, rather than genuine visual reconstruction. We argue that careful selection of datasets and target features, coupled with rigorous evaluation methods, is essential for achieving authentic visual image reconstruction. These insights underscore the importance of grounding interdisciplinary discussions in a thorough understanding of the technology’s current capabilities and limitations to ensure responsible development.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107515"},"PeriodicalIF":6.0000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025003946","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Advances in brain decoding, particularly in visual image reconstruction, have sparked discussions about the societal implications and ethical considerations of neurotechnology. As reconstruction methods aim to recover visual experiences from brain activity and achieve prediction beyond training samples (zero-shot prediction), it is crucial to assess their capabilities and limitations to inform public expectations and regulations. Our case study of recent text-guided reconstruction methods, which leverage a large-scale dataset (Natural Scenes Dataset, NSD) and text-to-image diffusion models, reveals critical limitations in their generalizability, demonstrated by poor reconstructions on a different dataset. UMAP visualization of the text features from NSD images shows limited diversity with overlapping semantic and visual clusters between training and test sets. We identify that clustered training samples can lead to “output dimension collapse,” restricting predictable output feature dimensions. While diverse training data improves generalization over the entire feature space without requiring exponential scaling, text features alone prove insufficient for mapping to the visual space. Our findings suggest that the apparent realism in current text-guided reconstructions stems from a combination of classification into trained categories and inauthentic image generation (hallucination) through diffusion models, rather than genuine visual reconstruction. We argue that careful selection of datasets and target features, coupled with rigorous evaluation methods, is essential for achieving authentic visual image reconstruction. These insights underscore the importance of grounding interdisciplinary discussions in a thorough understanding of the technology’s current capabilities and limitations to ensure responsible development.

查看原文本刊更多论文

大脑活动的虚假重建

大脑解码的进步，特别是在视觉图像重建方面，引发了关于神经技术的社会影响和伦理考虑的讨论。由于重建方法旨在从大脑活动中恢复视觉体验，并实现超出训练样本的预测（零概率预测），因此评估其能力和局限性以告知公众期望和法规至关重要。我们对最近的文本引导重建方法进行了案例研究，这些方法利用了大规模数据集（自然场景数据集，NSD）和文本到图像扩散模型，揭示了其通用性的关键局限性，这可以通过在不同数据集上的糟糕重建来证明。来自NSD图像的文本特征的UMAP可视化显示有限的多样性，在训练集和测试集之间存在重叠的语义和视觉聚类。我们发现聚类训练样本会导致“输出维度崩溃”，限制可预测的输出特征维度。虽然多样化的训练数据提高了整个特征空间的泛化，而不需要指数缩放，但单独的文本特征不足以映射到视觉空间。我们的研究结果表明，当前文本引导重建中明显的真实感源于对训练类别的分类和通过扩散模型生成的不真实图像（幻觉）的结合，而不是真实的视觉重建。我们认为，仔细选择数据集和目标特征，再加上严格的评估方法，对于实现真实的视觉图像重建至关重要。这些见解强调了在全面理解技术当前能力和局限性的基础上进行跨学科讨论的重要性，以确保负责任的开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.