MFA-NRM:视觉神经解码中多模态融合和语义对齐的新框架

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wei Huang , Hengjiang Li , Fan Qin , Jingpeng Li , Sizhuo Wang , Pengfei Yang , Luan Zhang , Yunshuang Fan , Jing Guo , Kaiwen Cheng , Huafu Chen
{"title":"MFA-NRM:视觉神经解码中多模态融合和语义对齐的新框架","authors":"Wei Huang ,&nbsp;Hengjiang Li ,&nbsp;Fan Qin ,&nbsp;Jingpeng Li ,&nbsp;Sizhuo Wang ,&nbsp;Pengfei Yang ,&nbsp;Luan Zhang ,&nbsp;Yunshuang Fan ,&nbsp;Jing Guo ,&nbsp;Kaiwen Cheng ,&nbsp;Huafu Chen","doi":"10.1016/j.inffus.2025.103717","DOIUrl":null,"url":null,"abstract":"<div><div>Integrating multimodal semantic features, such as images and text, to enhance visual neural representations has proven to be an effective strategy in brain visual decoding. However, previous studies have either focused solely on unimodal enhancement techniques or have inadequately addressed the alignment ambiguity between different modalities, leading to an underutilization of the complementary benefits of multimodal features or a reduction in the semantic richness of the resulting neural representations. To address these limitations, we propose a Multimodal Fusion Alignment Neural Representation Model (MFA-NRM), which enhances visual neural decoding by integrating multimodal semantic features from images and text. The MFA-NRM incorporates a fusion module that utilizes a Variational Autoencoder (VAE) and a self-attention mechanism to integrate multimodal features into a unified latent space, thereby facilitating robust semantic alignment with neural activity. Additionally, we introduce prompt techniques that adapt neural representations to individual differences, improving cross-subject generalization. Our approach also leverages the semantic knowledge from ten large pre-trained models to further enhance performance. Experimental results on the Natural Scenes Dataset (NSD) show that, compared to unimodal alignment methods, our method improves recognition tasks by 18.8 % and classification tasks by 4.30 %, compared to other multimodal alignment methods without the fusion module, our approach improves recognition tasks by 33.59 % and classification tasks by 4.26 %. These findings indicate that the MFA-NRM effectively resolves the problem of alignment ambiguity and enables richer semantic extraction from brain responses to multimodal visual stimuli, offering new perspectives for visual neural decoding.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103717"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MFA-NRM: A novel framework for multimodal fusion and semantic alignment in visual neural decoding\",\"authors\":\"Wei Huang ,&nbsp;Hengjiang Li ,&nbsp;Fan Qin ,&nbsp;Jingpeng Li ,&nbsp;Sizhuo Wang ,&nbsp;Pengfei Yang ,&nbsp;Luan Zhang ,&nbsp;Yunshuang Fan ,&nbsp;Jing Guo ,&nbsp;Kaiwen Cheng ,&nbsp;Huafu Chen\",\"doi\":\"10.1016/j.inffus.2025.103717\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Integrating multimodal semantic features, such as images and text, to enhance visual neural representations has proven to be an effective strategy in brain visual decoding. However, previous studies have either focused solely on unimodal enhancement techniques or have inadequately addressed the alignment ambiguity between different modalities, leading to an underutilization of the complementary benefits of multimodal features or a reduction in the semantic richness of the resulting neural representations. To address these limitations, we propose a Multimodal Fusion Alignment Neural Representation Model (MFA-NRM), which enhances visual neural decoding by integrating multimodal semantic features from images and text. The MFA-NRM incorporates a fusion module that utilizes a Variational Autoencoder (VAE) and a self-attention mechanism to integrate multimodal features into a unified latent space, thereby facilitating robust semantic alignment with neural activity. Additionally, we introduce prompt techniques that adapt neural representations to individual differences, improving cross-subject generalization. Our approach also leverages the semantic knowledge from ten large pre-trained models to further enhance performance. Experimental results on the Natural Scenes Dataset (NSD) show that, compared to unimodal alignment methods, our method improves recognition tasks by 18.8 % and classification tasks by 4.30 %, compared to other multimodal alignment methods without the fusion module, our approach improves recognition tasks by 33.59 % and classification tasks by 4.26 %. These findings indicate that the MFA-NRM effectively resolves the problem of alignment ambiguity and enables richer semantic extraction from brain responses to multimodal visual stimuli, offering new perspectives for visual neural decoding.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103717\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525007766\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525007766","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

整合图像和文本等多模态语义特征来增强视觉神经表征已被证明是一种有效的脑视觉解码策略。然而,以前的研究要么只关注单模态增强技术,要么没有充分解决不同模态之间的对齐模糊性,导致对多模态特征的互补优势的利用不足,或者导致所得到的神经表征的语义丰富性降低。为了解决这些限制,我们提出了一个多模态融合对齐神经表示模型(MFA-NRM),该模型通过整合图像和文本的多模态语义特征来增强视觉神经解码。MFA-NRM集成了一个融合模块,该模块利用变分自编码器(VAE)和自注意机制将多模态特征集成到统一的潜在空间中,从而促进与神经活动的鲁棒语义对齐。此外,我们引入了提示技术,使神经表征适应个体差异,提高跨主题泛化。我们的方法还利用来自10个大型预训练模型的语义知识来进一步提高性能。在自然场景数据集(NSD)上的实验结果表明,与单模态对齐方法相比,我们的方法识别任务和分类任务分别提高了18.8%和4.30%;与其他未添加融合模块的多模态对齐方法相比,我们的方法识别任务和分类任务分别提高了33.59%和4.26%。这些结果表明,MFA-NRM有效地解决了对齐模糊问题,并从大脑对多模态视觉刺激的反应中提取了更丰富的语义,为视觉神经解码提供了新的视角。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MFA-NRM: A novel framework for multimodal fusion and semantic alignment in visual neural decoding
Integrating multimodal semantic features, such as images and text, to enhance visual neural representations has proven to be an effective strategy in brain visual decoding. However, previous studies have either focused solely on unimodal enhancement techniques or have inadequately addressed the alignment ambiguity between different modalities, leading to an underutilization of the complementary benefits of multimodal features or a reduction in the semantic richness of the resulting neural representations. To address these limitations, we propose a Multimodal Fusion Alignment Neural Representation Model (MFA-NRM), which enhances visual neural decoding by integrating multimodal semantic features from images and text. The MFA-NRM incorporates a fusion module that utilizes a Variational Autoencoder (VAE) and a self-attention mechanism to integrate multimodal features into a unified latent space, thereby facilitating robust semantic alignment with neural activity. Additionally, we introduce prompt techniques that adapt neural representations to individual differences, improving cross-subject generalization. Our approach also leverages the semantic knowledge from ten large pre-trained models to further enhance performance. Experimental results on the Natural Scenes Dataset (NSD) show that, compared to unimodal alignment methods, our method improves recognition tasks by 18.8 % and classification tasks by 4.30 %, compared to other multimodal alignment methods without the fusion module, our approach improves recognition tasks by 33.59 % and classification tasks by 4.26 %. These findings indicate that the MFA-NRM effectively resolves the problem of alignment ambiguity and enables richer semantic extraction from brain responses to multimodal visual stimuli, offering new perspectives for visual neural decoding.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信