Influence of Number of Stimuli for Subjective Speech Quality Assessment in Crowdsourcing

2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX) Pub Date : 2018-05-01 DOI:10.1109/QoMEX.2018.8463298

Rafael Zequeira Jiménez, Laura Fernández Gallardo, S. Möller

{"title":"Influence of Number of Stimuli for Subjective Speech Quality Assessment in Crowdsourcing","authors":"Rafael Zequeira Jiménez, Laura Fernández Gallardo, S. Möller","doi":"10.1109/QoMEX.2018.8463298","DOIUrl":null,"url":null,"abstract":"Nowadays, crowdsourcing provides an exceptional opportunity for conducting subjective user tests on the Internet with a demographically diverse audience. Previous work has pointed out that the offered tasks should be kept short in time, therefore, participants evaluate at once just a portion of the dataset. Aspects like users' workload and fatigue are important as they relate to a main question: how to optimize study design without compromising results quality by tiring the test participants? This work investigates the influence of the number of presented speech stimuli on the reliability of listeners' ratings in the context of subjective speech quality assessment. A crowdsourcing study have been conducted with 209 listeners that were asked to rate speech stimuli with respect to their overall quality. Participants were randomly assigned to one of three user groups, each of which was confronted with tasks consisting of a different number of stimuli: 10, 20, or 40. The results from the three groups are highly correlated to existing laboratory ratings, the group with the largest number of samples offering the highest correlation. However, participant retention decreased while the study completion time increased. Thus, it might be desirable to offer tasks with less speech stimuli sacrificing ratings' accuracy to some extent.","PeriodicalId":6618,"journal":{"name":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QoMEX.2018.8463298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Nowadays, crowdsourcing provides an exceptional opportunity for conducting subjective user tests on the Internet with a demographically diverse audience. Previous work has pointed out that the offered tasks should be kept short in time, therefore, participants evaluate at once just a portion of the dataset. Aspects like users' workload and fatigue are important as they relate to a main question: how to optimize study design without compromising results quality by tiring the test participants? This work investigates the influence of the number of presented speech stimuli on the reliability of listeners' ratings in the context of subjective speech quality assessment. A crowdsourcing study have been conducted with 209 listeners that were asked to rate speech stimuli with respect to their overall quality. Participants were randomly assigned to one of three user groups, each of which was confronted with tasks consisting of a different number of stimuli: 10, 20, or 40. The results from the three groups are highly correlated to existing laboratory ratings, the group with the largest number of samples offering the highest correlation. However, participant retention decreased while the study completion time increased. Thus, it might be desirable to offer tasks with less speech stimuli sacrificing ratings' accuracy to some extent.

查看原文本刊更多论文

刺激数对众包中主观语音质量评价的影响

如今，众包为在互联网上与人口统计学上不同的受众进行主观用户测试提供了绝佳的机会。先前的工作指出，提供的任务应该保持在较短的时间内，因此，参与者一次只评估数据集的一部分。用户的工作量和疲劳等方面很重要，因为它们关系到一个主要问题:如何优化研究设计，而不因测试参与者疲劳而影响结果质量?本研究探讨了在主观语音质量评估的背景下，呈现的语音刺激数量对听者评分可靠性的影响。一项众包研究对209名听众进行了调查，他们被要求对语音刺激的整体质量进行评分。参与者被随机分配到三个用户组中的一个，每个用户组都面临由不同数量的刺激组成的任务:10个、20个或40个。这三组的结果与现有的实验室评级高度相关，样本数量最多的一组提供最高的相关性。然而，随着研究完成时间的增加，参与者的记忆力下降。因此，提供较少语音刺激的任务可能会在一定程度上牺牲评分的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)

自引率

0.00%

发文量