复杂动态场景中的多模态情感识别方法

Journal of Information and Intelligence Pub Date : 2025-05-01 DOI:10.1016/j.jiixd.2025.02.004

Long Liu , Qingquan Luo , Wenbo Zhang , Mengxuan Zhang , Bowen Zhai

{"title":"复杂动态场景中的多模态情感识别方法","authors":"Long Liu , Qingquan Luo , Wenbo Zhang , Mengxuan Zhang , Bowen Zhai","doi":"10.1016/j.jiixd.2025.02.004","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal emotion recognition technology leverages the power of deep learning to address advanced visual and emotional tasks. While generic deep networks can handle simple emotion recognition tasks, their generalization capability in complex and noisy environments, such as multi-scene outdoor settings, remains limited. To overcome these challenges, this paper proposes a novel multimodal emotion recognition framework. First, we develop a robust network architecture based on the T5-small model, designed for dynamic-static fusion in complex scenarios, effectively mitigating the impact of noise. Second, we introduce a dynamic-static cross fusion network (D-SCFN) to enhance the integration and extraction of dynamic and static information, embedding it seamlessly within the T5 framework. Finally, we design and evaluate three distinct multi-task analysis frameworks to explore dependencies among tasks. The experimental results demonstrate that our model significantly outperforms other existing models, showcasing exceptional stability and remarkable adaptability to complex and dynamic scenarios.</div></div>","PeriodicalId":100790,"journal":{"name":"Journal of Information and Intelligence","volume":"3 3","pages":"Pages 257-274"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal emotion recognition method in complex dynamic scenes\",\"authors\":\"Long Liu , Qingquan Luo , Wenbo Zhang , Mengxuan Zhang , Bowen Zhai\",\"doi\":\"10.1016/j.jiixd.2025.02.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multimodal emotion recognition technology leverages the power of deep learning to address advanced visual and emotional tasks. While generic deep networks can handle simple emotion recognition tasks, their generalization capability in complex and noisy environments, such as multi-scene outdoor settings, remains limited. To overcome these challenges, this paper proposes a novel multimodal emotion recognition framework. First, we develop a robust network architecture based on the T5-small model, designed for dynamic-static fusion in complex scenarios, effectively mitigating the impact of noise. Second, we introduce a dynamic-static cross fusion network (D-SCFN) to enhance the integration and extraction of dynamic and static information, embedding it seamlessly within the T5 framework. Finally, we design and evaluate three distinct multi-task analysis frameworks to explore dependencies among tasks. The experimental results demonstrate that our model significantly outperforms other existing models, showcasing exceptional stability and remarkable adaptability to complex and dynamic scenarios.</div></div>\",\"PeriodicalId\":100790,\"journal\":{\"name\":\"Journal of Information and Intelligence\",\"volume\":\"3 3\",\"pages\":\"Pages 257-274\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information and Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949715925000046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949715925000046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多模态情感识别技术利用深度学习的力量来解决高级视觉和情感任务。虽然通用深度网络可以处理简单的情绪识别任务，但它们在复杂和嘈杂环境（如多场景户外环境）中的泛化能力仍然有限。为了克服这些挑战，本文提出了一种新的多模态情感识别框架。首先，我们开发了基于T5-small模型的鲁棒网络架构，设计用于复杂场景下的动态-静态融合，有效减轻噪声的影响。其次，我们引入了一种动态-静态交叉融合网络（D-SCFN）来增强动态和静态信息的集成和提取，并将其无缝嵌入到T5框架中。最后，我们设计并评估了三个不同的多任务分析框架，以探索任务之间的依赖关系。实验结果表明，我们的模型明显优于其他现有模型，表现出优异的稳定性和对复杂和动态场景的卓越适应性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multimodal emotion recognition method in complex dynamic scenes

Multimodal emotion recognition technology leverages the power of deep learning to address advanced visual and emotional tasks. While generic deep networks can handle simple emotion recognition tasks, their generalization capability in complex and noisy environments, such as multi-scene outdoor settings, remains limited. To overcome these challenges, this paper proposes a novel multimodal emotion recognition framework. First, we develop a robust network architecture based on the T5-small model, designed for dynamic-static fusion in complex scenarios, effectively mitigating the impact of noise. Second, we introduce a dynamic-static cross fusion network (D-SCFN) to enhance the integration and extraction of dynamic and static information, embedding it seamlessly within the T5 framework. Finally, we design and evaluate three distinct multi-task analysis frameworks to explore dependencies among tasks. The experimental results demonstrate that our model significantly outperforms other existing models, showcasing exceptional stability and remarkable adaptability to complex and dynamic scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information and Intelligence

自引率

0.00%

发文量