Ao Li;Minchao Wu;Rui Ouyang;Yongming Wang;Fan Li;Zhao Lv
{"title":"一种多模态驱动的情感识别融合数据增强框架","authors":"Ao Li;Minchao Wu;Rui Ouyang;Yongming Wang;Fan Li;Zhao Lv","doi":"10.1109/TAI.2025.3537965","DOIUrl":null,"url":null,"abstract":"The pursuit of imbuing computers with emotional intelligence has driven extensive research into physiological signal analysis for emotion recognition. Deep learning techniques offer promising avenues for analyzing physiological signals in this domain. Despite numerous studies on emotion recognition using various physiological signals, challenges persist in classifying multimodal physiological signals due to data scarcity. Current research lacks focus on addressing data insufficiency for multimodal physiological signals. This article proposes an innovative method to address this issue and improve the effect of emotion recognition using multimodal physiological signal data. Our model comprises a physiological signal encoder, a multimodal data generator, and a multimodal emotion recognizer. Specifically, we introduce a customized ConvNeXt-attention fusion model (CNXAF) to fuse diverse physiological signals, generating fused multimodal data. The multimodal data generator employs a conditional self-attention generative adversarial network (c-SAGAN) to synthesize additional data across different categories, augmenting original datasets. Finally, the multimodal emotion recognizer utilizes the ConvNeXt-t classifier for emotion recognition on the extended dataset. Through extensive experimentation, our model achieves accuracies of 96.06<inline-formula><tex-math>$\\%$</tex-math></inline-formula> on the DEAP dataset and 95.70<inline-formula><tex-math>$\\%$</tex-math></inline-formula> on the WESAD dataset, demonstrating the effectiveness of our approach in accurately recognizing emotions. Experimental results underscore the superior performance of our method compared to existing approaches in multimodal emotion recognition research.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 8","pages":"2083-2097"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Multimodal-Driven Fusion Data Augmentation Framework for Emotion Recognition\",\"authors\":\"Ao Li;Minchao Wu;Rui Ouyang;Yongming Wang;Fan Li;Zhao Lv\",\"doi\":\"10.1109/TAI.2025.3537965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The pursuit of imbuing computers with emotional intelligence has driven extensive research into physiological signal analysis for emotion recognition. Deep learning techniques offer promising avenues for analyzing physiological signals in this domain. Despite numerous studies on emotion recognition using various physiological signals, challenges persist in classifying multimodal physiological signals due to data scarcity. Current research lacks focus on addressing data insufficiency for multimodal physiological signals. This article proposes an innovative method to address this issue and improve the effect of emotion recognition using multimodal physiological signal data. Our model comprises a physiological signal encoder, a multimodal data generator, and a multimodal emotion recognizer. Specifically, we introduce a customized ConvNeXt-attention fusion model (CNXAF) to fuse diverse physiological signals, generating fused multimodal data. The multimodal data generator employs a conditional self-attention generative adversarial network (c-SAGAN) to synthesize additional data across different categories, augmenting original datasets. Finally, the multimodal emotion recognizer utilizes the ConvNeXt-t classifier for emotion recognition on the extended dataset. Through extensive experimentation, our model achieves accuracies of 96.06<inline-formula><tex-math>$\\\\%$</tex-math></inline-formula> on the DEAP dataset and 95.70<inline-formula><tex-math>$\\\\%$</tex-math></inline-formula> on the WESAD dataset, demonstrating the effectiveness of our approach in accurately recognizing emotions. Experimental results underscore the superior performance of our method compared to existing approaches in multimodal emotion recognition research.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 8\",\"pages\":\"2083-2097\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10896942/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10896942/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Multimodal-Driven Fusion Data Augmentation Framework for Emotion Recognition
The pursuit of imbuing computers with emotional intelligence has driven extensive research into physiological signal analysis for emotion recognition. Deep learning techniques offer promising avenues for analyzing physiological signals in this domain. Despite numerous studies on emotion recognition using various physiological signals, challenges persist in classifying multimodal physiological signals due to data scarcity. Current research lacks focus on addressing data insufficiency for multimodal physiological signals. This article proposes an innovative method to address this issue and improve the effect of emotion recognition using multimodal physiological signal data. Our model comprises a physiological signal encoder, a multimodal data generator, and a multimodal emotion recognizer. Specifically, we introduce a customized ConvNeXt-attention fusion model (CNXAF) to fuse diverse physiological signals, generating fused multimodal data. The multimodal data generator employs a conditional self-attention generative adversarial network (c-SAGAN) to synthesize additional data across different categories, augmenting original datasets. Finally, the multimodal emotion recognizer utilizes the ConvNeXt-t classifier for emotion recognition on the extended dataset. Through extensive experimentation, our model achieves accuracies of 96.06$\%$ on the DEAP dataset and 95.70$\%$ on the WESAD dataset, demonstrating the effectiveness of our approach in accurately recognizing emotions. Experimental results underscore the superior performance of our method compared to existing approaches in multimodal emotion recognition research.