{"title":"MMTrans-MT:一个使用多任务学习的多模态情绪识别框架","authors":"Jinrui Shen, Jiahao Zheng, Xiaoping Wang","doi":"10.1109/ICACI52617.2021.9435906","DOIUrl":null,"url":null,"abstract":"With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.","PeriodicalId":382483,"journal":{"name":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning\",\"authors\":\"Jinrui Shen, Jiahao Zheng, Xiaoping Wang\",\"doi\":\"10.1109/ICACI52617.2021.9435906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.\",\"PeriodicalId\":382483,\"journal\":{\"name\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI52617.2021.9435906\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI52617.2021.9435906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning
With the development of deep learning, emotion recognition tasks are more inclined to use multimodal data and adequate supervised information to improve accuracy. In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed. It has three modules: modalities representation module, multimodal fusion module, and multitask output module. Three modalities, i.e, words, audio and video, are comprehensively utilized to carry out emotion recognition by a simple but efficient fusion model based on Transformer. As for multitask learning, the two tasks are defined as categorical emotion classification and dimensional emotion regression. Considering a potential mapping relationship between two kinds of emotion model, multitask learning is adopted to make the two tasks promote each other and improve recognition accuracy. We conduct experiments on CMU-MOSEI and IEMOCAP datasets. Comprehensive experiments show that the accuracy of recognition using multimodal information is higher than that using unimodal information. Adopting multitask learning promotes the performance of emotion recognition.