Practical considerations on the use of preference learning for ranking emotional speech

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2016-03-20 DOI:10.1109/ICASSP.2016.7472670

Reza Lotfian, C. Busso

{"title":"Practical considerations on the use of preference learning for ranking emotional speech","authors":"Reza Lotfian, C. Busso","doi":"10.1109/ICASSP.2016.7472670","DOIUrl":null,"url":null,"abstract":"A speech emotion retrieval system aims to detect a subset of data with specific expressive content. Preference learning represents an appealing framework to rank speech samples in terms of continuous attributes such as arousal and valence. The training of ranking classifiers usually requires pairwise samples where one is preferred over the other according to a specific criterion. For emotional databases, these relative labels are not available and are very difficult to collect. As an alternative, they can be derived from existing absolute emotional labels. For continuous attributes, we can create relative rankings by forming pairs with high and low values of a specific attribute which are separated by a predefined margin. This approach raises questions about efficient approaches for building such a training set, which is important to improve the performance of the emotional retrieval system. This paper analyzes practical considerations in training ranking classifiers including optimum number of pairs used during training, and the margin used to define the relative labels. We compare the preference learning approach to binary classifier and regression models. The experimental results on a spontaneous emotional database indicate that a rank-based classifier with fine-tuned parameters outperforms the other two approaches in both arousal and valence dimensions.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2016.7472670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

A speech emotion retrieval system aims to detect a subset of data with specific expressive content. Preference learning represents an appealing framework to rank speech samples in terms of continuous attributes such as arousal and valence. The training of ranking classifiers usually requires pairwise samples where one is preferred over the other according to a specific criterion. For emotional databases, these relative labels are not available and are very difficult to collect. As an alternative, they can be derived from existing absolute emotional labels. For continuous attributes, we can create relative rankings by forming pairs with high and low values of a specific attribute which are separated by a predefined margin. This approach raises questions about efficient approaches for building such a training set, which is important to improve the performance of the emotional retrieval system. This paper analyzes practical considerations in training ranking classifiers including optimum number of pairs used during training, and the margin used to define the relative labels. We compare the preference learning approach to binary classifier and regression models. The experimental results on a spontaneous emotional database indicate that a rank-based classifier with fine-tuned parameters outperforms the other two approaches in both arousal and valence dimensions.

查看原文本刊更多论文

使用偏好学习对情绪言语排序的实际考虑

语音情感检索系统旨在检测具有特定表达内容的数据子集。偏好学习代表了一种有吸引力的框架，可以根据唤醒和效价等连续属性对语音样本进行排序。排序分类器的训练通常需要成对样本，其中一个根据特定的标准优于另一个。对于情感数据库，这些相对标签是不可用的，并且很难收集。作为一种选择，它们可以从现有的绝对情感标签中衍生出来。对于连续属性，我们可以通过形成特定属性的高值和低值对来创建相对排名，这些值由预定义的边距分隔。这种方法提出了建立这种训练集的有效方法的问题，这对提高情感检索系统的性能很重要。本文分析了训练排序分类器的实际考虑因素，包括训练中使用的最优对数，以及用于定义相对标签的余量。我们将偏好学习方法与二元分类器和回归模型进行比较。在一个自发情绪数据库上的实验结果表明，具有微调参数的基于等级的分类器在唤醒和效价维度上都优于其他两种方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量