Wei Zhang, Weixi Gu, Fei Ma, S. Ni, Lin Zhang, Shao-Lun Huang
{"title":"Multimodal Emotion Recognition by extracting common and modality-specific information","authors":"Wei Zhang, Weixi Gu, Fei Ma, S. Ni, Lin Zhang, Shao-Lun Huang","doi":"10.1145/3274783.3275200","DOIUrl":null,"url":null,"abstract":"Emotion recognition technologies have been widely used in numerous areas including advertising, healthcare and online education. Previous works usually recognize the emotion from either the acoustic or the visual signal, yielding unsatisfied performances and limited applications. To improve the inference capability, we present a multimodal emotion recognition model, EMOdal. Apart from learning the audio and visual data respectively, EMOdal efficiently learns the common and modality-specific information underlying the two kinds of signals, and therefore improves the inference ability. The model has been evaluated on our large-scale emotional data set. The comprehensive evaluations demonstrate that our model outperforms traditional approaches.","PeriodicalId":156307,"journal":{"name":"Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems","volume":"331 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3274783.3275200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Emotion recognition technologies have been widely used in numerous areas including advertising, healthcare and online education. Previous works usually recognize the emotion from either the acoustic or the visual signal, yielding unsatisfied performances and limited applications. To improve the inference capability, we present a multimodal emotion recognition model, EMOdal. Apart from learning the audio and visual data respectively, EMOdal efficiently learns the common and modality-specific information underlying the two kinds of signals, and therefore improves the inference ability. The model has been evaluated on our large-scale emotional data set. The comprehensive evaluations demonstrate that our model outperforms traditional approaches.