End-to-End Image Reconstruction of Image from Human Functional Magnetic Resonance Imaging Based on the "Language" of Visual Cortex

Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence Pub Date : 2020-04-23 DOI:10.1145/3404555.3404593

Ziya Yu, Kai Qiao, Chi Zhang, Linyuan Wang, Bin Yan

{"title":"End-to-End Image Reconstruction of Image from Human Functional Magnetic Resonance Imaging Based on the \"Language\" of Visual Cortex","authors":"Ziya Yu, Kai Qiao, Chi Zhang, Linyuan Wang, Bin Yan","doi":"10.1145/3404555.3404593","DOIUrl":null,"url":null,"abstract":"In recent years, with the development of deep learning, the integration between neuroscience and computer vision has been deepened. In computer vision, it has been possible to generate images from text as well as semantic understanding from images based on deep learning. Here, text refers to human language, and the language that a computer can understand typically requires text to be encoded. In human brain visual expression, it also produces \"descriptions\" of visual stimuli, that is, the \"language\" that generates from the brain itself. Reconstruction of visual information is the process of reconstructing visual stimuli from the understanding of human brain, which is the most difficult to achieve in visual decoding. And based on the existing research of visual mechanisms, it is still difficult to understand the \"language\" of human brain. Inspired by generating images from text, we regarded voxel responses as the \"language\" of brain in order to reconstruct visual stimuli and built an end-to-end visual decoding model under the condition of small number of samples. We simply retrained a generative adversarial network (GAN) used to generate images from text on 1200 training data (including natural image stimuli and corresponding voxel responses). We regarded voxel responses as semantic information of brain, and sent them to GAN as prior information. The results showed that the decoding model we trained can reconstruct the natural images successfully. It also suggested the feasibility of reconstructing visual stimuli from \"brain language\", and the end-to-end model was more likely to learn the direct mapping between brain activity and visual perception. Moreover, it further indicated the great potential of combining neuroscience and computer vision.","PeriodicalId":220526,"journal":{"name":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","volume":"7 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3404555.3404593","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, with the development of deep learning, the integration between neuroscience and computer vision has been deepened. In computer vision, it has been possible to generate images from text as well as semantic understanding from images based on deep learning. Here, text refers to human language, and the language that a computer can understand typically requires text to be encoded. In human brain visual expression, it also produces "descriptions" of visual stimuli, that is, the "language" that generates from the brain itself. Reconstruction of visual information is the process of reconstructing visual stimuli from the understanding of human brain, which is the most difficult to achieve in visual decoding. And based on the existing research of visual mechanisms, it is still difficult to understand the "language" of human brain. Inspired by generating images from text, we regarded voxel responses as the "language" of brain in order to reconstruct visual stimuli and built an end-to-end visual decoding model under the condition of small number of samples. We simply retrained a generative adversarial network (GAN) used to generate images from text on 1200 training data (including natural image stimuli and corresponding voxel responses). We regarded voxel responses as semantic information of brain, and sent them to GAN as prior information. The results showed that the decoding model we trained can reconstruct the natural images successfully. It also suggested the feasibility of reconstructing visual stimuli from "brain language", and the end-to-end model was more likely to learn the direct mapping between brain activity and visual perception. Moreover, it further indicated the great potential of combining neuroscience and computer vision.

查看原文本刊更多论文

基于视觉皮层“语言”的人体功能磁共振图像端到端图像重建

近年来，随着深度学习的发展，神经科学与计算机视觉的融合不断加深。在计算机视觉中，已经可以从文本中生成图像，以及基于深度学习的图像语义理解。这里，文本指的是人类语言，而计算机能够理解的语言通常需要对文本进行编码。在人脑视觉表达中，也产生对视觉刺激的“描述”，即大脑自身产生的“语言”。视觉信息重构是将人脑对视觉刺激的理解进行重构的过程，是视觉解码中最难实现的部分。而基于现有的视觉机制研究，理解人类大脑的“语言”仍然很困难。受文本生成图像的启发，我们将体素响应作为大脑的“语言”来重构视觉刺激，构建了样本数量较少情况下的端到端视觉解码模型。我们简单地重新训练了一个生成对抗网络(GAN)，用于在1200个训练数据(包括自然图像刺激和相应的体素响应)上从文本生成图像。我们将体素响应作为大脑的语义信息，作为先验信息发送给GAN。结果表明，我们所训练的解码模型能够成功地重建自然图像。这也提示了从“大脑语言”重构视觉刺激的可行性，端到端模型更有可能学习到大脑活动与视觉感知之间的直接映射。此外，它进一步表明神经科学与计算机视觉相结合的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence

自引率

0.00%

发文量