Beniamin Bucur, Iulia Somfelean, Alexandru Ghiurutan, C. Lemnaru, M. Dînsoreanu
{"title":"An early fusion approach for multimodal emotion recognition using deep recurrent networks","authors":"Beniamin Bucur, Iulia Somfelean, Alexandru Ghiurutan, C. Lemnaru, M. Dînsoreanu","doi":"10.1109/ICCP.2018.8516437","DOIUrl":null,"url":null,"abstract":"In this paper we compare different strategies for handling incomplete data and different classification architectures for emotion recognition from multimodal data, using an early fusion approach. In order to allow the different modalities to complement each other at feature level, the initial task was to align the data at the same frame rate. The source data possessed a high degree of incompleteness, which we addressed by different imputation approaches. Since the data was missing in blocks, we found that the best performing approach was to replace missing values with zeros. For the classification model, we experimented with LSTM and GRU networks, in both unidirectional and bidirectional flavors, and various hyper-parameter settings. We found that a bidirectional GRU model trained using a smaller batch size and more aggressive dropout produced the best classification performance.","PeriodicalId":259007,"journal":{"name":"2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCP.2018.8516437","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper we compare different strategies for handling incomplete data and different classification architectures for emotion recognition from multimodal data, using an early fusion approach. In order to allow the different modalities to complement each other at feature level, the initial task was to align the data at the same frame rate. The source data possessed a high degree of incompleteness, which we addressed by different imputation approaches. Since the data was missing in blocks, we found that the best performing approach was to replace missing values with zeros. For the classification model, we experimented with LSTM and GRU networks, in both unidirectional and bidirectional flavors, and various hyper-parameter settings. We found that a bidirectional GRU model trained using a smaller batch size and more aggressive dropout produced the best classification performance.