Asymmetric simulation-enhanced flow reconstruction for incomplete multimodal learning

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jiacheng Yao , Jing Zhang , Yixiao Wang , Li Zhuo
{"title":"Asymmetric simulation-enhanced flow reconstruction for incomplete multimodal learning","authors":"Jiacheng Yao ,&nbsp;Jing Zhang ,&nbsp;Yixiao Wang ,&nbsp;Li Zhuo","doi":"10.1016/j.patcog.2025.112413","DOIUrl":null,"url":null,"abstract":"<div><div>Incomplete multimodal learning addresses the common real-world challenge of missing modalities, which undermines the performance of standard multimodal methods. Existing solutions struggle with distribution mismatches between reconstructed and observed data, asymmetric cross-modal structures, and insufficient cross-modal knowledge sharing. To tackle these issues, we propose an asymmetric simulation-enhanced flow reconstruction (ASE-FR) framework, which contains following contributions: (1) Distribution-consistent flow reconstruction module that align available and missing modality distributions by normalizing flows; (2) Asymmetric simulation module that perturbs and randomly masks features to mimic real-world modality absence and improve robustness; (3) Modal-shared knowledge distillation that transfers shared representations from teacher encoders to a student encoder through contrastive learning. This framework is applicable to a range of real-world scenarios, such as multi-sensor networks in smart manufacturing, medical diagnostic systems combining imaging and electronic health records, and autonomous driving platforms that integrate camera and LiDAR data. The experimental results show that our ASE-FR method achieves 94.71 %, 41.85 % and 81.90 % accuracy on Audiovision-MNIST, MM-IMDb and IEMOCAP datasets, as well as 1.1376 error rate on CMU-MOSI dataset, which exhibits competitive performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112413"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032501074X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Incomplete multimodal learning addresses the common real-world challenge of missing modalities, which undermines the performance of standard multimodal methods. Existing solutions struggle with distribution mismatches between reconstructed and observed data, asymmetric cross-modal structures, and insufficient cross-modal knowledge sharing. To tackle these issues, we propose an asymmetric simulation-enhanced flow reconstruction (ASE-FR) framework, which contains following contributions: (1) Distribution-consistent flow reconstruction module that align available and missing modality distributions by normalizing flows; (2) Asymmetric simulation module that perturbs and randomly masks features to mimic real-world modality absence and improve robustness; (3) Modal-shared knowledge distillation that transfers shared representations from teacher encoders to a student encoder through contrastive learning. This framework is applicable to a range of real-world scenarios, such as multi-sensor networks in smart manufacturing, medical diagnostic systems combining imaging and electronic health records, and autonomous driving platforms that integrate camera and LiDAR data. The experimental results show that our ASE-FR method achieves 94.71 %, 41.85 % and 81.90 % accuracy on Audiovision-MNIST, MM-IMDb and IEMOCAP datasets, as well as 1.1376 error rate on CMU-MOSI dataset, which exhibits competitive performance.
不完全多模态学习的非对称模拟增强流重构
不完全多模态学习解决了缺少模态这一现实世界中常见的挑战,这破坏了标准多模态方法的性能。现有的解决方案存在重构数据与观测数据分布不匹配、跨模态结构不对称以及跨模态知识共享不足等问题。为了解决这些问题,我们提出了一个非对称模拟增强流重建(ASE-FR)框架,它包含以下贡献:(1)分布一致的流重建模块,通过规范化流来对齐可用和缺失的模态分布;(2)非对称仿真模块,通过扰动和随机屏蔽特征来模拟现实世界的模态缺失,提高鲁棒性;(3)模态共享知识蒸馏,通过对比学习将共享表征从教师编码器转移到学生编码器。该框架适用于一系列现实场景,例如智能制造中的多传感器网络,结合成像和电子健康记录的医疗诊断系统,以及集成摄像头和激光雷达数据的自动驾驶平台。实验结果表明,该方法在Audiovision-MNIST、MM-IMDb和IEMOCAP数据集上的准确率分别为94.71%、41.85%和81.90%,在CMU-MOSI数据集上的错误率为1.1376,具有一定的竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信