{"title":"Cross-language acoustic emotion recognition: An overview and some tendencies","authors":"S. M. Feraru, Dagmar M. Schuller, Björn Schuller","doi":"10.1109/ACII.2015.7344561","DOIUrl":null,"url":null,"abstract":"Automatic emotion recognition from speech has matured close to the point where it reaches broader commercial interest. One of the last major limiting factors is the ability to deal with multilingual inputs as will be given in a real-life operating system in many if not most cases. As in real-life scenarios speech is often used mixed across languages more experience will be needed in performance effects of cross-language recognition. In this contribution we first provide an overview on languages covered in the research on emotion and speech finding that only roughly two thirds of native speakers' languages are so far touched upon. We thus next shed light on mis-matched vs matched condition emotion recognition across a variety of languages. By intention, we include less researched languages of more distant language families such as Burmese, Romanian or Turkish. Binary arousal and valence mapping is employed in order to be able to train and test across databases that have originally been labelled in diverse categories. In the result - as one may expect - arousal recognition works considerably better across languages than valence, and cross-language recognition falls considerably behind within-language recognition. However, within-language family recognition seems to provide an `emergency-solution' in case of missing language resources, and the observed notable differences depending on the combination of languages show a number of interesting effects.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"17 1","pages":"125-131"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACII.2015.7344561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 58
Abstract
Automatic emotion recognition from speech has matured close to the point where it reaches broader commercial interest. One of the last major limiting factors is the ability to deal with multilingual inputs as will be given in a real-life operating system in many if not most cases. As in real-life scenarios speech is often used mixed across languages more experience will be needed in performance effects of cross-language recognition. In this contribution we first provide an overview on languages covered in the research on emotion and speech finding that only roughly two thirds of native speakers' languages are so far touched upon. We thus next shed light on mis-matched vs matched condition emotion recognition across a variety of languages. By intention, we include less researched languages of more distant language families such as Burmese, Romanian or Turkish. Binary arousal and valence mapping is employed in order to be able to train and test across databases that have originally been labelled in diverse categories. In the result - as one may expect - arousal recognition works considerably better across languages than valence, and cross-language recognition falls considerably behind within-language recognition. However, within-language family recognition seems to provide an `emergency-solution' in case of missing language resources, and the observed notable differences depending on the combination of languages show a number of interesting effects.