Cross-language acoustic emotion recognition: An overview and some tendencies

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) Pub Date : 2015-09-21 DOI:10.1109/ACII.2015.7344561

S. M. Feraru, Dagmar M. Schuller, Björn Schuller

{"title":"Cross-language acoustic emotion recognition: An overview and some tendencies","authors":"S. M. Feraru, Dagmar M. Schuller, Björn Schuller","doi":"10.1109/ACII.2015.7344561","DOIUrl":null,"url":null,"abstract":"Automatic emotion recognition from speech has matured close to the point where it reaches broader commercial interest. One of the last major limiting factors is the ability to deal with multilingual inputs as will be given in a real-life operating system in many if not most cases. As in real-life scenarios speech is often used mixed across languages more experience will be needed in performance effects of cross-language recognition. In this contribution we first provide an overview on languages covered in the research on emotion and speech finding that only roughly two thirds of native speakers' languages are so far touched upon. We thus next shed light on mis-matched vs matched condition emotion recognition across a variety of languages. By intention, we include less researched languages of more distant language families such as Burmese, Romanian or Turkish. Binary arousal and valence mapping is employed in order to be able to train and test across databases that have originally been labelled in diverse categories. In the result - as one may expect - arousal recognition works considerably better across languages than valence, and cross-language recognition falls considerably behind within-language recognition. However, within-language family recognition seems to provide an `emergency-solution' in case of missing language resources, and the observed notable differences depending on the combination of languages show a number of interesting effects.","PeriodicalId":6863,"journal":{"name":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","volume":"17 1","pages":"125-131"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Affective Computing and Intelligent Interaction (ACII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACII.2015.7344561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

Automatic emotion recognition from speech has matured close to the point where it reaches broader commercial interest. One of the last major limiting factors is the ability to deal with multilingual inputs as will be given in a real-life operating system in many if not most cases. As in real-life scenarios speech is often used mixed across languages more experience will be needed in performance effects of cross-language recognition. In this contribution we first provide an overview on languages covered in the research on emotion and speech finding that only roughly two thirds of native speakers' languages are so far touched upon. We thus next shed light on mis-matched vs matched condition emotion recognition across a variety of languages. By intention, we include less researched languages of more distant language families such as Burmese, Romanian or Turkish. Binary arousal and valence mapping is employed in order to be able to train and test across databases that have originally been labelled in diverse categories. In the result - as one may expect - arousal recognition works considerably better across languages than valence, and cross-language recognition falls considerably behind within-language recognition. However, within-language family recognition seems to provide an `emergency-solution' in case of missing language resources, and the observed notable differences depending on the combination of languages show a number of interesting effects.

查看原文本刊更多论文

跨语言声音情感识别:综述及趋势

语音的自动情感识别技术已经成熟到可以实现更广泛的商业利益。最后一个主要限制因素是处理多语言输入的能力，这在许多(如果不是大多数的话)实际操作系统中都有。由于在现实生活中，语音经常是跨语言混合使用的，因此跨语言识别的表现效果需要更多的经验。在这篇文章中，我们首先概述了情感和语言研究中涉及的语言，发现到目前为止，只有大约三分之二的母语被触及。因此，我们接下来阐明了跨各种语言的不匹配与匹配条件情感识别。有意地，我们包括较少研究的语言更遥远的语系，如缅甸语，罗马尼亚语或土耳其语。为了能够跨数据库进行训练和测试，采用了二元唤醒和价映射，这些数据库最初被标记为不同的类别。结果，正如人们所预料的那样，唤醒识别在不同语言之间的表现要比效价好得多，而跨语言识别则远远落后于语言内识别。然而，在缺少语言资源的情况下，语言族内部识别似乎提供了一种“紧急解决方案”，并且根据语言组合所观察到的显着差异显示了许多有趣的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

自引率

0.00%

发文量