{"title":"情绪逆向推理树和主导情态在会话情绪识别中的应用","authors":"Shidan Wei, Xianying Huang, Chengyang Zhang","doi":"10.1016/j.knosys.2025.114035","DOIUrl":null,"url":null,"abstract":"<div><div>Emotion recognition in conversation (ERC) is crucial to advancing human–computer interaction. However, current methods often ignore the importance of keywords in emotional expression, neglecting both the emotional information these keywords convey and their dynamic variations. In addition, previous studies have not deeply considered the characteristics and commonalities of heterogeneous modalities before fusion, leading to noise accumulation and weakened intermodal interactions. During multimodal fusion, these methods have not effectively accounted for strength differences between modalities, particularly underestimating the notable influence of the text modality on ERC. Moreover, traditional research has made only limited attempts to enhance modality representation capabilities. To address these issues, we propose the Emotional Inverse Reasoning Trees and Dominant Modal Focus model (EIRT-DMF) for ERC. The model leverages commonsense knowledge to extract keywords from utterances and introduces an innovative emotional inverse reasoning tree structure to enhance textual semantic representation and strengthen the transmission of emotional cues. Meanwhile, we design a modality optimization module to handle intra-modality associations and cross-modality interactions. In the fusion phase, the text modality is employed as the dominant modality to gain a collaborative understanding of intermodal semantics. In addition, we introduce a hybrid knowledge-distillation mechanism that employs multilevel learning to generate higher-quality multimodal representations. Experiments on the IEMOCAP and MELD datasets indicate that EIRT-DMF achieved state-of-the-art (SOTA) performance compared to all baselines.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"326 ","pages":"Article 114035"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Emotional inverse reasoning trees and dominant modality focus for emotion recognition in conversations\",\"authors\":\"Shidan Wei, Xianying Huang, Chengyang Zhang\",\"doi\":\"10.1016/j.knosys.2025.114035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Emotion recognition in conversation (ERC) is crucial to advancing human–computer interaction. However, current methods often ignore the importance of keywords in emotional expression, neglecting both the emotional information these keywords convey and their dynamic variations. In addition, previous studies have not deeply considered the characteristics and commonalities of heterogeneous modalities before fusion, leading to noise accumulation and weakened intermodal interactions. During multimodal fusion, these methods have not effectively accounted for strength differences between modalities, particularly underestimating the notable influence of the text modality on ERC. Moreover, traditional research has made only limited attempts to enhance modality representation capabilities. To address these issues, we propose the Emotional Inverse Reasoning Trees and Dominant Modal Focus model (EIRT-DMF) for ERC. The model leverages commonsense knowledge to extract keywords from utterances and introduces an innovative emotional inverse reasoning tree structure to enhance textual semantic representation and strengthen the transmission of emotional cues. Meanwhile, we design a modality optimization module to handle intra-modality associations and cross-modality interactions. In the fusion phase, the text modality is employed as the dominant modality to gain a collaborative understanding of intermodal semantics. In addition, we introduce a hybrid knowledge-distillation mechanism that employs multilevel learning to generate higher-quality multimodal representations. Experiments on the IEMOCAP and MELD datasets indicate that EIRT-DMF achieved state-of-the-art (SOTA) performance compared to all baselines.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"326 \",\"pages\":\"Article 114035\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125010809\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125010809","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Emotional inverse reasoning trees and dominant modality focus for emotion recognition in conversations
Emotion recognition in conversation (ERC) is crucial to advancing human–computer interaction. However, current methods often ignore the importance of keywords in emotional expression, neglecting both the emotional information these keywords convey and their dynamic variations. In addition, previous studies have not deeply considered the characteristics and commonalities of heterogeneous modalities before fusion, leading to noise accumulation and weakened intermodal interactions. During multimodal fusion, these methods have not effectively accounted for strength differences between modalities, particularly underestimating the notable influence of the text modality on ERC. Moreover, traditional research has made only limited attempts to enhance modality representation capabilities. To address these issues, we propose the Emotional Inverse Reasoning Trees and Dominant Modal Focus model (EIRT-DMF) for ERC. The model leverages commonsense knowledge to extract keywords from utterances and introduces an innovative emotional inverse reasoning tree structure to enhance textual semantic representation and strengthen the transmission of emotional cues. Meanwhile, we design a modality optimization module to handle intra-modality associations and cross-modality interactions. In the fusion phase, the text modality is employed as the dominant modality to gain a collaborative understanding of intermodal semantics. In addition, we introduce a hybrid knowledge-distillation mechanism that employs multilevel learning to generate higher-quality multimodal representations. Experiments on the IEMOCAP and MELD datasets indicate that EIRT-DMF achieved state-of-the-art (SOTA) performance compared to all baselines.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.