{"title":"Use of non-verbal vocalizations for continuous emotion recognition from speech and head motion","authors":"Syeda Narjis Fatima, E. Erzin","doi":"10.1109/ICIEA.2019.8834351","DOIUrl":null,"url":null,"abstract":"Dyadic interactions are reflective of mutual engagement between their participants through different verbal and non-verbal voicing cues. This study aims to investigate the effect of these cues on continuous emotion recognition (CER) using speech and head motion data. We exploit the non-verbal vocalizations that are extracted from speech as a complementary source of information and investigate their effect for the CER problem using gaussian mixture and convolutional neural network based regression frameworks. Our methods are evaluated on the CreativeIT database, which consists of speech and full-body motion capture under dyadic interaction settings. Head motion, acoustic features of speech and histograms of non-verbal vocalizations are employed to estimate activation, valence and dominance attributes for the CER problem. Our experimental evaluations indicate a strong improvement of CER performance, especially of the activation attribute, with the use of non-verbal vocalization cues of speech.","PeriodicalId":311302,"journal":{"name":"2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEA.2019.8834351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Dyadic interactions are reflective of mutual engagement between their participants through different verbal and non-verbal voicing cues. This study aims to investigate the effect of these cues on continuous emotion recognition (CER) using speech and head motion data. We exploit the non-verbal vocalizations that are extracted from speech as a complementary source of information and investigate their effect for the CER problem using gaussian mixture and convolutional neural network based regression frameworks. Our methods are evaluated on the CreativeIT database, which consists of speech and full-body motion capture under dyadic interaction settings. Head motion, acoustic features of speech and histograms of non-verbal vocalizations are employed to estimate activation, valence and dominance attributes for the CER problem. Our experimental evaluations indicate a strong improvement of CER performance, especially of the activation attribute, with the use of non-verbal vocalization cues of speech.