Influence of Number of Stimuli for Subjective Speech Quality Assessment in Crowdsourcing

Rafael Zequeira Jiménez, Laura Fernández Gallardo, S. Möller
{"title":"Influence of Number of Stimuli for Subjective Speech Quality Assessment in Crowdsourcing","authors":"Rafael Zequeira Jiménez, Laura Fernández Gallardo, S. Möller","doi":"10.1109/QoMEX.2018.8463298","DOIUrl":null,"url":null,"abstract":"Nowadays, crowdsourcing provides an exceptional opportunity for conducting subjective user tests on the Internet with a demographically diverse audience. Previous work has pointed out that the offered tasks should be kept short in time, therefore, participants evaluate at once just a portion of the dataset. Aspects like users' workload and fatigue are important as they relate to a main question: how to optimize study design without compromising results quality by tiring the test participants? This work investigates the influence of the number of presented speech stimuli on the reliability of listeners' ratings in the context of subjective speech quality assessment. A crowdsourcing study have been conducted with 209 listeners that were asked to rate speech stimuli with respect to their overall quality. Participants were randomly assigned to one of three user groups, each of which was confronted with tasks consisting of a different number of stimuli: 10, 20, or 40. The results from the three groups are highly correlated to existing laboratory ratings, the group with the largest number of samples offering the highest correlation. However, participant retention decreased while the study completion time increased. Thus, it might be desirable to offer tasks with less speech stimuli sacrificing ratings' accuracy to some extent.","PeriodicalId":6618,"journal":{"name":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Tenth International Conference on Quality of Multimedia Experience (QoMEX)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QoMEX.2018.8463298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Nowadays, crowdsourcing provides an exceptional opportunity for conducting subjective user tests on the Internet with a demographically diverse audience. Previous work has pointed out that the offered tasks should be kept short in time, therefore, participants evaluate at once just a portion of the dataset. Aspects like users' workload and fatigue are important as they relate to a main question: how to optimize study design without compromising results quality by tiring the test participants? This work investigates the influence of the number of presented speech stimuli on the reliability of listeners' ratings in the context of subjective speech quality assessment. A crowdsourcing study have been conducted with 209 listeners that were asked to rate speech stimuli with respect to their overall quality. Participants were randomly assigned to one of three user groups, each of which was confronted with tasks consisting of a different number of stimuli: 10, 20, or 40. The results from the three groups are highly correlated to existing laboratory ratings, the group with the largest number of samples offering the highest correlation. However, participant retention decreased while the study completion time increased. Thus, it might be desirable to offer tasks with less speech stimuli sacrificing ratings' accuracy to some extent.
刺激数对众包中主观语音质量评价的影响
如今,众包为在互联网上与人口统计学上不同的受众进行主观用户测试提供了绝佳的机会。先前的工作指出,提供的任务应该保持在较短的时间内,因此,参与者一次只评估数据集的一部分。用户的工作量和疲劳等方面很重要,因为它们关系到一个主要问题:如何优化研究设计,而不因测试参与者疲劳而影响结果质量?本研究探讨了在主观语音质量评估的背景下,呈现的语音刺激数量对听者评分可靠性的影响。一项众包研究对209名听众进行了调查,他们被要求对语音刺激的整体质量进行评分。参与者被随机分配到三个用户组中的一个,每个用户组都面临由不同数量的刺激组成的任务:10个、20个或40个。这三组的结果与现有的实验室评级高度相关,样本数量最多的一组提供最高的相关性。然而,随着研究完成时间的增加,参与者的记忆力下降。因此,提供较少语音刺激的任务可能会在一定程度上牺牲评分的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信