Crowd-sourcing for difficult transcription of speech

J. Williams, I. D. Melamed, Tirso Alonso, B. Hollister, J. Wilpon
{"title":"Crowd-sourcing for difficult transcription of speech","authors":"J. Williams, I. D. Melamed, Tirso Alonso, B. Hollister, J. Wilpon","doi":"10.1109/ASRU.2011.6163988","DOIUrl":null,"url":null,"abstract":"Crowd-sourcing is a promising method for fast and cheap transcription of large volumes of speech data. However, this method cannot achieve the accuracy of expert transcribers on speech that is difficult to transcribe. Faced with such speech data, we developed three new methods of crowd-sourcing, which allow explicit trade-offs among precision, recall, and cost. The methods are: incremental redundancy, treating ASR as a transcriber, and using a regression model to predict transcription reliability. Even though the accuracy of individual crowd-workers is only 55% on our data, our best method achieves 90% accuracy on 93% of the utterances, using only 1.3 crowd-worker transcriptions per utterance on average. When forced to transcribe all utterances, our best method matches the accuracy of previous crowd-sourcing methods using only one third as many transcriptions. We also study the effects of various task design factors on transcription latency and accuracy, some of which have not been reported before.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"16 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

Crowd-sourcing is a promising method for fast and cheap transcription of large volumes of speech data. However, this method cannot achieve the accuracy of expert transcribers on speech that is difficult to transcribe. Faced with such speech data, we developed three new methods of crowd-sourcing, which allow explicit trade-offs among precision, recall, and cost. The methods are: incremental redundancy, treating ASR as a transcriber, and using a regression model to predict transcription reliability. Even though the accuracy of individual crowd-workers is only 55% on our data, our best method achieves 90% accuracy on 93% of the utterances, using only 1.3 crowd-worker transcriptions per utterance on average. When forced to transcribe all utterances, our best method matches the accuracy of previous crowd-sourcing methods using only one third as many transcriptions. We also study the effects of various task design factors on transcription latency and accuracy, some of which have not been reported before.
为困难的语音转录提供众包
对于大量语音数据的快速、廉价转录,众包是一种很有前途的方法。然而,对于难以转录的语音,这种方法无法达到专家转录员的准确性。面对这样的语音数据,我们开发了三种新的众包方法,允许在精度,召回率和成本之间进行明确的权衡。方法是:增量冗余,将ASR视为转录因子,并使用回归模型预测转录可靠性。尽管个体众工在我们的数据上的准确率只有55%,但我们最好的方法在93%的话语上达到了90%的准确率,平均每个话语只使用1.3个众工转录。当被迫转录所有话语时,我们最好的方法与以前的众包方法相匹配,只使用三分之一的转录。我们还研究了各种任务设计因素对转录延迟和准确性的影响,其中一些以前没有报道过。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信