Models of dataset size, question design, and cross-language speech perception for speech crowdsourcing applications

IF 1.3 2区文学 0 LANGUAGE & LINGUISTICS

Laboratory Phonology Pub Date : 2015-10-01 DOI:10.1515/lp-2015-0012

M. Hasegawa-Johnson, J. Cole, P. Jyothi, L. Varshney

{"title":"Models of dataset size, question design, and cross-language speech perception for speech crowdsourcing applications","authors":"M. Hasegawa-Johnson, J. Cole, P. Jyothi, L. Varshney","doi":"10.1515/lp-2015-0012","DOIUrl":null,"url":null,"abstract":"Abstract Transcribers make mistakes. Workers recruited in a crowdsourcing marketplace, because of their varying levels of commitment and education, make more mistakes than workers in a controlled laboratory setting. Methods for compensating transcriber mistakes are desirable because, with such methods available, crowdsourcing has the potential to significantly increase the scale of experiments in laboratory phonology. This paper provides a brief tutorial on statistical learning theory, introducing the relationship between dataset size and estimation error, then presents a theoretical description and preliminary results for two new methods that control labeler error in laboratory phonology experiments. First, we discuss the method of crowdsourcing over error-correcting codes. In the error-correcting-code method, each difficult labeling task is first factored, by the experimenter, into the product of several easy labeling tasks (typically binary). Factoring increases the total number of tasks, nevertheless it results in faster completion and higher accuracy, because workers unable to perform the difficult task may be able to meaningfully contribute to the solution of each easy task. Second, we discuss the use of explicit mathematical models of the errors made by a worker in the crowd. In particular, we introduce the method of mismatched crowdsourcing, in which workers transcribe a language they do not understand, and an explicit mathematical model of second-language phoneme perception is used to learn and then compensate their transcription errors. Though introduced as technologies that increase the scale of phonology experiments, both methods have implications beyond increased scale. The method of easy questions permits us to probe the perception, by untrained listeners, of complicated phonological models; examples are provided from the prosody of English and Hindi. The method of mismatched crowdsourcing permits us to probe, in more detail than ever before, the perception of phonetic categories by listeners with a different phonological system.","PeriodicalId":45128,"journal":{"name":"Laboratory Phonology","volume":"52 1","pages":"381 - 431"},"PeriodicalIF":1.3000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/lp-2015-0012","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Laboratory Phonology","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/lp-2015-0012","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 13

Abstract

Abstract Transcribers make mistakes. Workers recruited in a crowdsourcing marketplace, because of their varying levels of commitment and education, make more mistakes than workers in a controlled laboratory setting. Methods for compensating transcriber mistakes are desirable because, with such methods available, crowdsourcing has the potential to significantly increase the scale of experiments in laboratory phonology. This paper provides a brief tutorial on statistical learning theory, introducing the relationship between dataset size and estimation error, then presents a theoretical description and preliminary results for two new methods that control labeler error in laboratory phonology experiments. First, we discuss the method of crowdsourcing over error-correcting codes. In the error-correcting-code method, each difficult labeling task is first factored, by the experimenter, into the product of several easy labeling tasks (typically binary). Factoring increases the total number of tasks, nevertheless it results in faster completion and higher accuracy, because workers unable to perform the difficult task may be able to meaningfully contribute to the solution of each easy task. Second, we discuss the use of explicit mathematical models of the errors made by a worker in the crowd. In particular, we introduce the method of mismatched crowdsourcing, in which workers transcribe a language they do not understand, and an explicit mathematical model of second-language phoneme perception is used to learn and then compensate their transcription errors. Though introduced as technologies that increase the scale of phonology experiments, both methods have implications beyond increased scale. The method of easy questions permits us to probe the perception, by untrained listeners, of complicated phonological models; examples are provided from the prosody of English and Hindi. The method of mismatched crowdsourcing permits us to probe, in more detail than ever before, the perception of phonetic categories by listeners with a different phonological system.

查看原文本刊更多论文

语音众包应用的数据集大小、问题设计和跨语言语音感知模型

转录员会犯错。在众包市场中招募的工人，由于他们的投入程度和受教育程度不同，比在受控实验室环境中的工人犯的错误更多。补偿转录者错误的方法是可取的，因为有了这些方法，众包有可能显著增加实验室音韵学实验的规模。本文简要介绍了统计学习理论，介绍了数据集大小与估计误差之间的关系，然后介绍了控制实验室音韵学实验中标注器误差的两种新方法的理论描述和初步结果。首先，我们讨论了纠错码的众包方法。在纠错码方法中，实验者首先将每个困难的标注任务分解成几个简单标注任务的乘积(通常是二进制)。分解增加了任务的总数，但它的结果是更快的完成和更高的准确性，因为无法执行困难任务的工人可能能够有意义地为每个简单任务的解决方案做出贡献。其次，我们讨论了在人群中工人所犯错误的显式数学模型的使用。特别地，我们引入了错配众包的方法，在这种方法中，工人转录他们不理解的语言，并使用第二语言音素感知的明确数学模型来学习然后补偿他们的转录错误。虽然作为增加音韵学实验规模的技术而引入，但这两种方法的含义都超出了增加规模的范围。简单问题的方法使我们能够探测未经训练的听众对复杂语音模型的感知;从英语和印地语的韵律中提供了例子。不匹配众包的方法使我们能够比以往更详细地探索不同语音系统的听者对语音类别的感知。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊