{"title":"一种多模态数据驱动的短视频情感识别方法","authors":"Bingdian Yang, Qian Zhang, Zhichao Liu","doi":"10.1109/INSAI56792.2022.00014","DOIUrl":null,"url":null,"abstract":"With the fast development of artificial intelligence and short videos, emotion recognition has become one of the most important research topics in human-computer interaction. At present, most emotion recognition methods still stay in a single modality. However, in daily life, human beings will usually disguise their real emotions, which leads to the problem that the low accuracy of single modal emotion recognition. Moreover, it is not easy to distinguish similar emotions. Therefore, we propose a new approach denoted as ICANet to achieve multimodal short video emotion recognition by employing three different modalities of audio, video, and optical flow, making up for the lack of a single modality and then improving the accuracy of emotion recognition in short videos. ICANet has a better accuracy of 80.77% on the IEMOCAP benchmark. The cross-modal fusion method of short video emotion recognition established in this paper can effectively improve the accuracy of emotion recognition in human-computer interaction scenarios.","PeriodicalId":318264,"journal":{"name":"2022 2nd International Conference on Networking Systems of AI (INSAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data\",\"authors\":\"Bingdian Yang, Qian Zhang, Zhichao Liu\",\"doi\":\"10.1109/INSAI56792.2022.00014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the fast development of artificial intelligence and short videos, emotion recognition has become one of the most important research topics in human-computer interaction. At present, most emotion recognition methods still stay in a single modality. However, in daily life, human beings will usually disguise their real emotions, which leads to the problem that the low accuracy of single modal emotion recognition. Moreover, it is not easy to distinguish similar emotions. Therefore, we propose a new approach denoted as ICANet to achieve multimodal short video emotion recognition by employing three different modalities of audio, video, and optical flow, making up for the lack of a single modality and then improving the accuracy of emotion recognition in short videos. ICANet has a better accuracy of 80.77% on the IEMOCAP benchmark. The cross-modal fusion method of short video emotion recognition established in this paper can effectively improve the accuracy of emotion recognition in human-computer interaction scenarios.\",\"PeriodicalId\":318264,\"journal\":{\"name\":\"2022 2nd International Conference on Networking Systems of AI (INSAI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Networking Systems of AI (INSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INSAI56792.2022.00014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Networking Systems of AI (INSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INSAI56792.2022.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data
With the fast development of artificial intelligence and short videos, emotion recognition has become one of the most important research topics in human-computer interaction. At present, most emotion recognition methods still stay in a single modality. However, in daily life, human beings will usually disguise their real emotions, which leads to the problem that the low accuracy of single modal emotion recognition. Moreover, it is not easy to distinguish similar emotions. Therefore, we propose a new approach denoted as ICANet to achieve multimodal short video emotion recognition by employing three different modalities of audio, video, and optical flow, making up for the lack of a single modality and then improving the accuracy of emotion recognition in short videos. ICANet has a better accuracy of 80.77% on the IEMOCAP benchmark. The cross-modal fusion method of short video emotion recognition established in this paper can effectively improve the accuracy of emotion recognition in human-computer interaction scenarios.