细粒度情绪理解：半监督多模态情绪和强度识别

IF 4.5 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS

IEEE Transactions on Computational Social Systems Pub Date : 2024-10-29 DOI:10.1109/TCSS.2024.3475511

Zheng Fang;Zhen Liu;Tingting Liu;Chih-Chieh Hung

{"title":"细粒度情绪理解：半监督多模态情绪和强度识别","authors":"Zheng Fang;Zhen Liu;Tingting Liu;Chih-Chieh Hung","doi":"10.1109/TCSS.2024.3475511","DOIUrl":null,"url":null,"abstract":"The rapid advancement of deep learning and the exponential growth of multimodal data have led to increased attention on multimodal emotion analysis and comprehension in affect computing. While existing multimodal works have achieved notable results in emotion recognition, several challenges remain. First, the scarcity of public large-scale multimodal emotion datasets is attributed to the high cost of manual annotation and the subjectivity of handcrafted labels. Second, most approaches only focus on learning emotion category information, disregarding the crucial evaluation indicator of emotion intensity, which hampers the development of fine-grained emotion recognition. Third, a significant emotion semantic discrepancy exists in different modalities, and current methodologies struggle to bridge the cross-modal gap and effectively utilize a vast amount of unlabeled emotion data, hindering the production of high-quality pseudolabels and superior classification performance. To address these challenges, based on the multitask learning architecture, we propose a novel semisupervised fine-grained emotion recognition model SMEIR-net for multimodal emotion and intensity recognition. Concretely, in semisupervised learning (SSL) phase, we design multistage self-training and consistency regularization paradigm to generate high-quality pseudolabels. Then, in supervised learning phase, we leverage multimodal transformer fusion and adversarial learning to eliminate the cross-modal semantic discrepancy. Extensive experiments are conducted on three benchmark datasets, namely RAVDESS, eNTERFACE, and Lombard-GRID, to evaluate the proposed model. The series sets of experimental results demonstrate that our SSL model successfully utilizes multimodal data and available labels to transfer emotion and intensity information from labeled to unlabeled datasets. Moreover, the corresponding evaluation metrics demonstrate that the utilize high-quality pseudolabels can achieve superior emotion and intensity classification performance, which outperforms other state-of-the-art baselines under the same condition.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":"12 3","pages":"1145-1163"},"PeriodicalIF":4.5000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine-Grained Emotion Comprehension: Semisupervised Multimodal Emotion and Intensity Recognition\",\"authors\":\"Zheng Fang;Zhen Liu;Tingting Liu;Chih-Chieh Hung\",\"doi\":\"10.1109/TCSS.2024.3475511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid advancement of deep learning and the exponential growth of multimodal data have led to increased attention on multimodal emotion analysis and comprehension in affect computing. While existing multimodal works have achieved notable results in emotion recognition, several challenges remain. First, the scarcity of public large-scale multimodal emotion datasets is attributed to the high cost of manual annotation and the subjectivity of handcrafted labels. Second, most approaches only focus on learning emotion category information, disregarding the crucial evaluation indicator of emotion intensity, which hampers the development of fine-grained emotion recognition. Third, a significant emotion semantic discrepancy exists in different modalities, and current methodologies struggle to bridge the cross-modal gap and effectively utilize a vast amount of unlabeled emotion data, hindering the production of high-quality pseudolabels and superior classification performance. To address these challenges, based on the multitask learning architecture, we propose a novel semisupervised fine-grained emotion recognition model SMEIR-net for multimodal emotion and intensity recognition. Concretely, in semisupervised learning (SSL) phase, we design multistage self-training and consistency regularization paradigm to generate high-quality pseudolabels. Then, in supervised learning phase, we leverage multimodal transformer fusion and adversarial learning to eliminate the cross-modal semantic discrepancy. Extensive experiments are conducted on three benchmark datasets, namely RAVDESS, eNTERFACE, and Lombard-GRID, to evaluate the proposed model. The series sets of experimental results demonstrate that our SSL model successfully utilizes multimodal data and available labels to transfer emotion and intensity information from labeled to unlabeled datasets. Moreover, the corresponding evaluation metrics demonstrate that the utilize high-quality pseudolabels can achieve superior emotion and intensity classification performance, which outperforms other state-of-the-art baselines under the same condition.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":\"12 3\",\"pages\":\"1145-1163\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10737896/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737896/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

摘要

深度学习的快速发展和多模态数据的指数级增长使得情感计算中的多模态情感分析和理解受到越来越多的关注。虽然现有的多模态工作在情感识别方面取得了显著的成果，但仍存在一些挑战。首先，公共大规模多模态情感数据集的稀缺是由于手工标注的高成本和手工制作标签的主观性。其次，大多数方法只关注情绪类别信息的学习，忽视了情绪强度这一重要的评价指标，阻碍了细粒度情绪识别的发展。第三，在不同的模态中存在显著的情感语义差异，目前的方法难以弥合跨模态的差距，无法有效地利用大量未标记的情感数据，从而阻碍了高质量伪标签的产生和更好的分类性能。为了解决这些挑战，基于多任务学习架构，我们提出了一种新的半监督细粒度情绪识别模型SMEIR-net，用于多模态情绪和强度识别。具体而言，在半监督学习（SSL）阶段，我们设计了多阶段自训练和一致性正则化范式来生成高质量的伪标签。然后，在监督学习阶段，我们利用多模态变压器融合和对抗学习来消除跨模态语义差异。在RAVDESS、eNTERFACE和Lombard-GRID三个基准数据集上进行了大量实验，以评估所提出的模型。一系列实验结果表明，我们的SSL模型成功地利用多模态数据和可用标签将情绪和强度信息从标记数据集转移到未标记数据集。此外，相应的评价指标表明，利用高质量的伪标签可以获得更好的情绪和强度分类性能，在相同条件下优于其他最先进的基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fine-Grained Emotion Comprehension: Semisupervised Multimodal Emotion and Intensity Recognition

The rapid advancement of deep learning and the exponential growth of multimodal data have led to increased attention on multimodal emotion analysis and comprehension in affect computing. While existing multimodal works have achieved notable results in emotion recognition, several challenges remain. First, the scarcity of public large-scale multimodal emotion datasets is attributed to the high cost of manual annotation and the subjectivity of handcrafted labels. Second, most approaches only focus on learning emotion category information, disregarding the crucial evaluation indicator of emotion intensity, which hampers the development of fine-grained emotion recognition. Third, a significant emotion semantic discrepancy exists in different modalities, and current methodologies struggle to bridge the cross-modal gap and effectively utilize a vast amount of unlabeled emotion data, hindering the production of high-quality pseudolabels and superior classification performance. To address these challenges, based on the multitask learning architecture, we propose a novel semisupervised fine-grained emotion recognition model SMEIR-net for multimodal emotion and intensity recognition. Concretely, in semisupervised learning (SSL) phase, we design multistage self-training and consistency regularization paradigm to generate high-quality pseudolabels. Then, in supervised learning phase, we leverage multimodal transformer fusion and adversarial learning to eliminate the cross-modal semantic discrepancy. Extensive experiments are conducted on three benchmark datasets, namely RAVDESS, eNTERFACE, and Lombard-GRID, to evaluate the proposed model. The series sets of experimental results demonstrate that our SSL model successfully utilizes multimodal data and available labels to transfer emotion and intensity information from labeled to unlabeled datasets. Moreover, the corresponding evaluation metrics demonstrate that the utilize high-quality pseudolabels can achieve superior emotion and intensity classification performance, which outperforms other state-of-the-art baselines under the same condition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)

CiteScore

10.00

自引率

20.00%

发文量

316

期刊介绍： IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.