Syed Aun Muhammad Zaidi;Siddique Latif;Junaid Qadir
{"title":"利用双重注意力转换器增强跨语言多模态情感识别能力","authors":"Syed Aun Muhammad Zaidi;Siddique Latif;Junaid Qadir","doi":"10.1109/OJCS.2024.3486904","DOIUrl":null,"url":null,"abstract":"Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"5 ","pages":"684-693"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10736634","citationCount":"0","resultStr":"{\"title\":\"Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers\",\"authors\":\"Syed Aun Muhammad Zaidi;Siddique Latif;Junaid Qadir\",\"doi\":\"10.1109/OJCS.2024.3486904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.\",\"PeriodicalId\":13205,\"journal\":{\"name\":\"IEEE Open Journal of the Computer Society\",\"volume\":\"5 \",\"pages\":\"684-693\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10736634\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10736634/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10736634/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
Despite the recent progress in emotion recognition, state-of-the-art systems are unable to achieve improved performance in cross-language settings. In this article we propose a Multimodal Dual Attention Transformer (MDAT) model to improve cross-language multimodal emotion recognition. Our model utilises pre-trained models for multimodal feature extraction and is equipped with dual attention mechanisms including graph attention and co-attention to capture complex dependencies across different modalities and languages to achieve improved cross-language multimodal emotion recognition. In addition, our model also exploits a transformer encoder layer for high-level feature representation to improve emotion classification accuracy. This novel construct preserves modality-specific emotional information while enhancing cross-modality and cross-language feature generalisation, resulting in improved performance with minimal target language data. We assess our model's performance on four publicly available emotion recognition datasets and establish its superior effectiveness compared to recent approaches and baseline models.