Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests

12th ISCA Speech Synthesis Workshop (SSW2023) Pub Date : 2023-08-26 DOI:10.21437/ssw.2023-30

K. Lakshminarayana, C. Dittmar, N. Pia, Emanuël Habets

引用次数: 0

Abstract

Modern text-to-speech (TTS) models are typically subjectively evaluated using an Absolute Category Rating (ACR) method. This method uses the mean opinion score to rate each model under test. However, if the models are perceptually too similar, assigning absolute ratings to stimuli might be difficult and prone to subjective preference errors. Pairwise comparison tests offer relative comparison and capture some of the subtle differences between the stimuli better. However, pairwise comparisons take more time as the number of tests increases exponentially with the number of models. Alternatively, a ranking-by-elimination (RBE) test can assess multiple models with similar benefits as pairwise comparisons for subtle differences across models without the time penalty. We compared the ACR and RBE tests for TTS evaluation in a controlled experiment. We found that the obtained results were statistically similar even in the presence of perceptually close TTS models.

查看原文本刊更多论文

文本-语音模型的主观评价:用消去测试比较绝对类别评定和排序

现代文本到语音(TTS)模型通常使用绝对类别评级(ACR)方法进行主观评估。该方法使用平均意见得分对每个被测模型进行评分。然而，如果模型在感知上过于相似，给刺激分配绝对评级可能会很困难，而且容易产生主观偏好错误。两两比较测试提供了相对比较，并更好地捕捉到刺激之间的一些细微差异。然而，两两比较需要更多的时间，因为测试的数量随着模型的数量呈指数增长。另一种方法是，通过消除排序(RBE)测试可以评估多个模型，这些模型具有与模型之间细微差异的两两比较相似的好处，而不会造成时间损失。我们在对照实验中比较了ACR和RBE测试对TTS的评价。我们发现，即使在感知上接近的TTS模型的存在下，所获得的结果在统计上也是相似的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

12th ISCA Speech Synthesis Workshop (SSW2023)

自引率

0.00%

发文量