Kalani Wataraka Gamage, T. Dang, V. Sethu, J. Epps, E. Ambikairajah
{"title":"基于显著事件学习知觉反应的基于语音的持续情绪预测——基于声音情感爆发和跨文化情感的研究","authors":"Kalani Wataraka Gamage, T. Dang, V. Sethu, J. Epps, E. Ambikairajah","doi":"10.1145/3266302.3266314","DOIUrl":null,"url":null,"abstract":"This paper presents a novel framework for speech-based continuous emotion prediction. The proposed model characterises the perceived emotion estimation as time-invariant responses to salient events. Then arousal and valence variation over time is modelded as the ouput of a parallel array of time-invariant filters where each filter represents a salient event in this context, and the impulse response of the filter represents the learned perception emotion response. The proposed model is evaluted by considering vocal affect bursts/non-verbal vocal gestures as salient event candidates. The proposed model is validated based on the development dataset of AVEC 2018 challenge development dataset and achieves the highest accuracy of valence prediction among single modal methods based on speech or speech-transcript. We tested this model on cross-cultural settings provided by AVEC 2018 challenge test set, and the model performs reasonably well for an unseen culture as well and outperform speech-based baselines. Further we explore inclusion of interlocutor related cues to the proposed model and decision level fusion with existing features. Since the proposed model was evaluated solely based on laughter and slight laughter affect bursts which were nominated as salient by proposed saliency constrains of the model, the results presented highlight the significance of aforementioned gestures in human emotion expression and perception","PeriodicalId":123523,"journal":{"name":"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop","volume":"396 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Speech-based Continuous Emotion Prediction by Learning Perception Responses related to Salient Events: A Study based on Vocal Affect Bursts and Cross-Cultural Affect in AVEC 2018\",\"authors\":\"Kalani Wataraka Gamage, T. Dang, V. Sethu, J. Epps, E. Ambikairajah\",\"doi\":\"10.1145/3266302.3266314\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel framework for speech-based continuous emotion prediction. The proposed model characterises the perceived emotion estimation as time-invariant responses to salient events. Then arousal and valence variation over time is modelded as the ouput of a parallel array of time-invariant filters where each filter represents a salient event in this context, and the impulse response of the filter represents the learned perception emotion response. The proposed model is evaluted by considering vocal affect bursts/non-verbal vocal gestures as salient event candidates. The proposed model is validated based on the development dataset of AVEC 2018 challenge development dataset and achieves the highest accuracy of valence prediction among single modal methods based on speech or speech-transcript. We tested this model on cross-cultural settings provided by AVEC 2018 challenge test set, and the model performs reasonably well for an unseen culture as well and outperform speech-based baselines. Further we explore inclusion of interlocutor related cues to the proposed model and decision level fusion with existing features. Since the proposed model was evaluated solely based on laughter and slight laughter affect bursts which were nominated as salient by proposed saliency constrains of the model, the results presented highlight the significance of aforementioned gestures in human emotion expression and perception\",\"PeriodicalId\":123523,\"journal\":{\"name\":\"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop\",\"volume\":\"396 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3266302.3266314\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3266302.3266314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech-based Continuous Emotion Prediction by Learning Perception Responses related to Salient Events: A Study based on Vocal Affect Bursts and Cross-Cultural Affect in AVEC 2018
This paper presents a novel framework for speech-based continuous emotion prediction. The proposed model characterises the perceived emotion estimation as time-invariant responses to salient events. Then arousal and valence variation over time is modelded as the ouput of a parallel array of time-invariant filters where each filter represents a salient event in this context, and the impulse response of the filter represents the learned perception emotion response. The proposed model is evaluted by considering vocal affect bursts/non-verbal vocal gestures as salient event candidates. The proposed model is validated based on the development dataset of AVEC 2018 challenge development dataset and achieves the highest accuracy of valence prediction among single modal methods based on speech or speech-transcript. We tested this model on cross-cultural settings provided by AVEC 2018 challenge test set, and the model performs reasonably well for an unseen culture as well and outperform speech-based baselines. Further we explore inclusion of interlocutor related cues to the proposed model and decision level fusion with existing features. Since the proposed model was evaluated solely based on laughter and slight laughter affect bursts which were nominated as salient by proposed saliency constrains of the model, the results presented highlight the significance of aforementioned gestures in human emotion expression and perception