Asymmetric simulation-enhanced flow reconstruction for incomplete multimodal learning

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-09-03 DOI:10.1016/j.patcog.2025.112413

Jiacheng Yao , Jing Zhang , Yixiao Wang , Li Zhuo

{"title":"Asymmetric simulation-enhanced flow reconstruction for incomplete multimodal learning","authors":"Jiacheng Yao , Jing Zhang , Yixiao Wang , Li Zhuo","doi":"10.1016/j.patcog.2025.112413","DOIUrl":null,"url":null,"abstract":"<div><div>Incomplete multimodal learning addresses the common real-world challenge of missing modalities, which undermines the performance of standard multimodal methods. Existing solutions struggle with distribution mismatches between reconstructed and observed data, asymmetric cross-modal structures, and insufficient cross-modal knowledge sharing. To tackle these issues, we propose an asymmetric simulation-enhanced flow reconstruction (ASE-FR) framework, which contains following contributions: (1) Distribution-consistent flow reconstruction module that align available and missing modality distributions by normalizing flows; (2) Asymmetric simulation module that perturbs and randomly masks features to mimic real-world modality absence and improve robustness; (3) Modal-shared knowledge distillation that transfers shared representations from teacher encoders to a student encoder through contrastive learning. This framework is applicable to a range of real-world scenarios, such as multi-sensor networks in smart manufacturing, medical diagnostic systems combining imaging and electronic health records, and autonomous driving platforms that integrate camera and LiDAR data. The experimental results show that our ASE-FR method achieves 94.71 %, 41.85 % and 81.90 % accuracy on Audiovision-MNIST, MM-IMDb and IEMOCAP datasets, as well as 1.1376 error rate on CMU-MOSI dataset, which exhibits competitive performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112413"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032501074X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Incomplete multimodal learning addresses the common real-world challenge of missing modalities, which undermines the performance of standard multimodal methods. Existing solutions struggle with distribution mismatches between reconstructed and observed data, asymmetric cross-modal structures, and insufficient cross-modal knowledge sharing. To tackle these issues, we propose an asymmetric simulation-enhanced flow reconstruction (ASE-FR) framework, which contains following contributions: (1) Distribution-consistent flow reconstruction module that align available and missing modality distributions by normalizing flows; (2) Asymmetric simulation module that perturbs and randomly masks features to mimic real-world modality absence and improve robustness; (3) Modal-shared knowledge distillation that transfers shared representations from teacher encoders to a student encoder through contrastive learning. This framework is applicable to a range of real-world scenarios, such as multi-sensor networks in smart manufacturing, medical diagnostic systems combining imaging and electronic health records, and autonomous driving platforms that integrate camera and LiDAR data. The experimental results show that our ASE-FR method achieves 94.71 %, 41.85 % and 81.90 % accuracy on Audiovision-MNIST, MM-IMDb and IEMOCAP datasets, as well as 1.1376 error rate on CMU-MOSI dataset, which exhibits competitive performance.

查看原文本刊更多论文

不完全多模态学习的非对称模拟增强流重构

不完全多模态学习解决了缺少模态这一现实世界中常见的挑战，这破坏了标准多模态方法的性能。现有的解决方案存在重构数据与观测数据分布不匹配、跨模态结构不对称以及跨模态知识共享不足等问题。为了解决这些问题，我们提出了一个非对称模拟增强流重建（ASE-FR）框架，它包含以下贡献：(1)分布一致的流重建模块，通过规范化流来对齐可用和缺失的模态分布；(2)非对称仿真模块，通过扰动和随机屏蔽特征来模拟现实世界的模态缺失，提高鲁棒性；(3)模态共享知识蒸馏，通过对比学习将共享表征从教师编码器转移到学生编码器。该框架适用于一系列现实场景，例如智能制造中的多传感器网络，结合成像和电子健康记录的医疗诊断系统，以及集成摄像头和激光雷达数据的自动驾驶平台。实验结果表明，该方法在Audiovision-MNIST、MM-IMDb和IEMOCAP数据集上的准确率分别为94.71%、41.85%和81.90%，在CMU-MOSI数据集上的错误率为1.1376，具有一定的竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.