通过自动图像字幕改进大脑活动图像重建。

IF 3.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Reports Pub Date : 2025-02-10 DOI:10.1038/s41598-025-89242-3

Fatemeh Kalantari, Karim Faez, Hamidreza Amindavar, Soheila Nazari

{"title":"通过自动图像字幕改进大脑活动图像重建。","authors":"Fatemeh Kalantari, Karim Faez, Hamidreza Amindavar, Soheila Nazari","doi":"10.1038/s41598-025-89242-3","DOIUrl":null,"url":null,"abstract":"Significant progress has been made in the field of image reconstruction using functional magnetic resonance imaging (fMRI). Certain investigations reconstructed images with visual information decoded from brain signals, yielding insufficient accuracy and quality. The combination of semantic information in the reconstruction was recommended to improve performance. However, this issue continues to come across numerous difficulties. To address such problems, we proposed an approach that combines semantically complex details with visual details for reconstruction. Our proposed method consists of two main modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, visual information is decoded from brain data using a decoder. This module employs a deep generator network (DGN) to produce images and utilizes a VGG19 network to extract visual features from the generated images. Image optimization is performed iteratively to minimize the error between features decoded from brain data and features extracted from the generated image. In the semantic reconstruction module, two models BLIP and LDM are employed. Using the BLIP model, we generate 10 captions for each training image. The semantic features extracted from the image captions, along with brain data obtained from training sessions, are used to train a decoder. The trained decoder is then utilized to decode semantic features from human brain activity. Finally, the reconstructed image from the visual reconstruction module is used as input to the LDM model, while the semantic features decoded from brain activity are provided as conditional input for semantic reconstruction. Including decoded semantic features improves reconstruction quality, as confirmed by our ablation study. Our strategy is superior both qualitatively and quantitatively to Shen et al.'s method, which utilizes a similar dataset. Our methodology achieved an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively, which are excellent for the quantitative evaluation of semantic content. We achieved an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric. Moreover, our proposed approach for semantic reconstruction of artificial shapes and imagined images achieved acceptable success, attaining accuracies of 0.566 and 0.627 based on the CLIP metric, and 0.671 and 0.565 based on the SSIM metric, respectively.","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"4907"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811215/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improved image reconstruction from brain activity through automatic image captioning.\",\"authors\":\"Fatemeh Kalantari, Karim Faez, Hamidreza Amindavar, Soheila Nazari\",\"doi\":\"10.1038/s41598-025-89242-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Significant progress has been made in the field of image reconstruction using functional magnetic resonance imaging (fMRI). Certain investigations reconstructed images with visual information decoded from brain signals, yielding insufficient accuracy and quality. The combination of semantic information in the reconstruction was recommended to improve performance. However, this issue continues to come across numerous difficulties. To address such problems, we proposed an approach that combines semantically complex details with visual details for reconstruction. Our proposed method consists of two main modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, visual information is decoded from brain data using a decoder. This module employs a deep generator network (DGN) to produce images and utilizes a VGG19 network to extract visual features from the generated images. Image optimization is performed iteratively to minimize the error between features decoded from brain data and features extracted from the generated image. In the semantic reconstruction module, two models BLIP and LDM are employed. Using the BLIP model, we generate 10 captions for each training image. The semantic features extracted from the image captions, along with brain data obtained from training sessions, are used to train a decoder. The trained decoder is then utilized to decode semantic features from human brain activity. Finally, the reconstructed image from the visual reconstruction module is used as input to the LDM model, while the semantic features decoded from brain activity are provided as conditional input for semantic reconstruction. Including decoded semantic features improves reconstruction quality, as confirmed by our ablation study. Our strategy is superior both qualitatively and quantitatively to Shen et al.'s method, which utilizes a similar dataset. Our methodology achieved an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively, which are excellent for the quantitative evaluation of semantic content. We achieved an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric. Moreover, our proposed approach for semantic reconstruction of artificial shapes and imagined images achieved acceptable success, attaining accuracies of 0.566 and 0.627 based on the CLIP metric, and 0.671 and 0.565 based on the SSIM metric, respectively.\",\"PeriodicalId\":21811,\"journal\":{\"name\":\"Scientific Reports\",\"volume\":\"15 1\",\"pages\":\"4907\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811215/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Reports\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1038/s41598-025-89242-3\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-89242-3","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

在利用功能磁共振成像（fMRI）重建图像领域取得了重大进展。某些研究利用从大脑信号中解码的视觉信息重建图像，但准确性和质量不足。建议在重建中结合语义信息，以提高性能。然而，这个问题仍然遇到了许多困难。为了解决这些问题，我们提出了一种结合语义复杂细节和视觉细节进行重建的方法。我们提出的方法包括两个主要模块：视觉重建和语义重建。在视觉重建模块中，使用解码器对大脑数据中的视觉信息进行解码。该模块采用深度生成器网络（DGN）生成图像，并利用 VGG19 网络从生成的图像中提取视觉特征。对图像进行迭代优化，以最小化从大脑数据解码的特征与从生成图像中提取的特征之间的误差。在语义重建模块中，采用了 BLIP 和 LDM 两种模型。利用 BLIP 模型，我们为每张训练图像生成 10 个标题。从图像标题中提取的语义特征与从训练课程中获得的大脑数据一起用于训练解码器。然后利用训练好的解码器对人脑活动中的语义特征进行解码。最后，视觉重建模块重建的图像被用作 LDM 模型的输入，而从大脑活动中解码的语义特征则被用作语义重建的条件输入。我们的消融研究证实，包含解码语义特征可提高重建质量。我们的策略在质量和数量上都优于 Shen 等人的方法，后者使用的是类似的数据集。我们的方法在初始和对比语言图像预训练（CLIP）指标上的准确率分别达到了 0.812 和 0.815，这对于语义内容的定量评估来说是非常出色的。我们在结构相似性指数（SSIM）测量中取得了 0.328 的准确度，这表明我们在低层次测量中表现出色。此外，我们提出的人工形状和想象图像语义重建方法也取得了可接受的成功，基于 CLIP 指标的准确率分别为 0.566 和 0.627，基于 SSIM 指标的准确率分别为 0.671 和 0.565。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Improved image reconstruction from brain activity through automatic image captioning.

查看原文本刊更多论文

Improved image reconstruction from brain activity through automatic image captioning.

Significant progress has been made in the field of image reconstruction using functional magnetic resonance imaging (fMRI). Certain investigations reconstructed images with visual information decoded from brain signals, yielding insufficient accuracy and quality. The combination of semantic information in the reconstruction was recommended to improve performance. However, this issue continues to come across numerous difficulties. To address such problems, we proposed an approach that combines semantically complex details with visual details for reconstruction. Our proposed method consists of two main modules: visual reconstruction and semantic reconstruction. In the visual reconstruction module, visual information is decoded from brain data using a decoder. This module employs a deep generator network (DGN) to produce images and utilizes a VGG19 network to extract visual features from the generated images. Image optimization is performed iteratively to minimize the error between features decoded from brain data and features extracted from the generated image. In the semantic reconstruction module, two models BLIP and LDM are employed. Using the BLIP model, we generate 10 captions for each training image. The semantic features extracted from the image captions, along with brain data obtained from training sessions, are used to train a decoder. The trained decoder is then utilized to decode semantic features from human brain activity. Finally, the reconstructed image from the visual reconstruction module is used as input to the LDM model, while the semantic features decoded from brain activity are provided as conditional input for semantic reconstruction. Including decoded semantic features improves reconstruction quality, as confirmed by our ablation study. Our strategy is superior both qualitatively and quantitatively to Shen et al.'s method, which utilizes a similar dataset. Our methodology achieved an accuracy of 0.812 and 0.815 for the inception and contrastive language-image pre-training (CLIP) metrics, respectively, which are excellent for the quantitative evaluation of semantic content. We achieved an accuracy of 0.328 in the structural similarity index measure (SSIM), indicating superior performance as a low-level metric. Moreover, our proposed approach for semantic reconstruction of artificial shapes and imagined images achieved acceptable success, attaining accuracies of 0.566 and 0.627 based on the CLIP metric, and 0.671 and 0.565 based on the SSIM metric, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Scientific Reports Natural Science Disciplines-

CiteScore

7.50

自引率

4.30%

发文量

19567

审稿时长

3.9 months

期刊介绍： We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections. Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021). •Engineering Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live. •Physical sciences Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics. •Earth and environmental sciences Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems. •Biological sciences Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants. •Health sciences The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.