Weiquan Fan , Xiangmin Xu , Fang Liu , Xiaofen Xing
{"title":"Multimodal speech emotion recognition via dynamic multilevel contrastive loss under local enhancement network","authors":"Weiquan Fan , Xiangmin Xu , Fang Liu , Xiaofen Xing","doi":"10.1016/j.eswa.2025.127669","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal speech emotion recognition is crucial for advancing human–computer interaction technology. Contrastive learning, due to its powerful ability of representation, is increasingly being applied to emotion recognition. Existing algorithms usually only consider samples of the same emotion as positive matching pairs, but ignore that the distances of different positive pairs are often different. For this issue, this paper designs a novel dynamic multilevel contrastive loss (DMCL), which achieves adaptive distance constraint by dynamic multilevel similarity. It generalizes positive matching pairs in different cases, assigns them different distances, and dynamically adjusts the corresponding labels while modeling. Building upon the DMCL, this paper further proposes a local enhancement attention mechanism (LEA) that enhances local information token-by-token on a global basis, which can enhance the robustness of the model to emotional mutations. By integrating the advantages of LEA and DMCL, this paper constructs an end-to-end multimodal speech emotion recognition network (LEDMCN). Finally, experimental results on the IEMOCAP and LSSED datasets validate the effectiveness of the proposed method, achieving state-of-the-art performance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"281 ","pages":"Article 127669"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425012916","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal speech emotion recognition is crucial for advancing human–computer interaction technology. Contrastive learning, due to its powerful ability of representation, is increasingly being applied to emotion recognition. Existing algorithms usually only consider samples of the same emotion as positive matching pairs, but ignore that the distances of different positive pairs are often different. For this issue, this paper designs a novel dynamic multilevel contrastive loss (DMCL), which achieves adaptive distance constraint by dynamic multilevel similarity. It generalizes positive matching pairs in different cases, assigns them different distances, and dynamically adjusts the corresponding labels while modeling. Building upon the DMCL, this paper further proposes a local enhancement attention mechanism (LEA) that enhances local information token-by-token on a global basis, which can enhance the robustness of the model to emotional mutations. By integrating the advantages of LEA and DMCL, this paper constructs an end-to-end multimodal speech emotion recognition network (LEDMCN). Finally, experimental results on the IEMOCAP and LSSED datasets validate the effectiveness of the proposed method, achieving state-of-the-art performance.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.