{"title":"介绍 Bed Word:用于社会语言学访谈转录的新型自动语音识别工具","authors":"Marcus Ma, Lelia Glass, James Stanford","doi":"10.1515/lingvan-2023-0073","DOIUrl":null,"url":null,"abstract":"We present Bed Word, a tool leveraging industrial automatic speech recognition (ASR) to transcribe sociophonetic data. While we find lower accuracy for minoritized English varieties, the resulting vowel measurements are overall very close to those derived from human-corrected gold data, so fully automated transcription may be suitable for some research purposes. For purposes requiring greater accuracy, we present a pipeline for human post-editing of automatically generated drafts, which we show is far faster than transcribing from scratch. Thus, we offer two ways to leverage ASR in sociolinguistic research: full automation and human post-editing. Augmenting the DARLA tool developed by Reddy and Stanford (2015b. Toward completely automated vowel extraction: Introducing DARLA. <jats:italic>Linguistics Vanguard</jats:italic> 1(1). 15–28), we hope that this resource can help speed up transcription for sociophonetic research.","PeriodicalId":55960,"journal":{"name":"Linguistics Vanguard","volume":"81 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Introducing Bed Word: a new automated speech recognition tool for sociolinguistic interview transcription\",\"authors\":\"Marcus Ma, Lelia Glass, James Stanford\",\"doi\":\"10.1515/lingvan-2023-0073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present Bed Word, a tool leveraging industrial automatic speech recognition (ASR) to transcribe sociophonetic data. While we find lower accuracy for minoritized English varieties, the resulting vowel measurements are overall very close to those derived from human-corrected gold data, so fully automated transcription may be suitable for some research purposes. For purposes requiring greater accuracy, we present a pipeline for human post-editing of automatically generated drafts, which we show is far faster than transcribing from scratch. Thus, we offer two ways to leverage ASR in sociolinguistic research: full automation and human post-editing. Augmenting the DARLA tool developed by Reddy and Stanford (2015b. Toward completely automated vowel extraction: Introducing DARLA. <jats:italic>Linguistics Vanguard</jats:italic> 1(1). 15–28), we hope that this resource can help speed up transcription for sociophonetic research.\",\"PeriodicalId\":55960,\"journal\":{\"name\":\"Linguistics Vanguard\",\"volume\":\"81 1\",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Linguistics Vanguard\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1515/lingvan-2023-0073\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistics Vanguard","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/lingvan-2023-0073","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
摘要
我们介绍了利用工业自动语音识别(ASR)转录社会语音数据的工具 Bed Word。虽然我们发现少量英语变体的准确率较低,但由此产生的元音测量结果总体上非常接近于从人工校正的黄金数据中得出的结果,因此全自动转录可能适合某些研究目的。对于要求更高精度的目的,我们提出了一种对自动生成的草稿进行人工后期编辑的方法,我们证明这种方法比从头开始转录要快得多。因此,我们提供了两种在社会语言学研究中利用 ASR 的方法:完全自动化和人工后期编辑。增强 Reddy 和 Stanford 开发的 DARLA 工具(2015b.实现完全自动化的元音提取:介绍 DARLA。Linguistics Vanguard 1(1).15-28),我们希望这一资源能够帮助加快社会语音学研究的转录速度。
Introducing Bed Word: a new automated speech recognition tool for sociolinguistic interview transcription
We present Bed Word, a tool leveraging industrial automatic speech recognition (ASR) to transcribe sociophonetic data. While we find lower accuracy for minoritized English varieties, the resulting vowel measurements are overall very close to those derived from human-corrected gold data, so fully automated transcription may be suitable for some research purposes. For purposes requiring greater accuracy, we present a pipeline for human post-editing of automatically generated drafts, which we show is far faster than transcribing from scratch. Thus, we offer two ways to leverage ASR in sociolinguistic research: full automation and human post-editing. Augmenting the DARLA tool developed by Reddy and Stanford (2015b. Toward completely automated vowel extraction: Introducing DARLA. Linguistics Vanguard 1(1). 15–28), we hope that this resource can help speed up transcription for sociophonetic research.
期刊介绍:
Linguistics Vanguard is a new channel for high quality articles and innovative approaches in all major fields of linguistics. This multimodal journal is published solely online and provides an accessible platform supporting both traditional and new kinds of publications. Linguistics Vanguard seeks to publish concise and up-to-date reports on the state of the art in linguistics as well as cutting-edge research papers. With its topical breadth of coverage and anticipated quick rate of production, it is one of the leading platforms for scientific exchange in linguistics. Its broad theoretical range, international scope, and diversity of article formats engage students and scholars alike. All topics within linguistics are welcome. The journal especially encourages submissions taking advantage of its new multimodal platform designed to integrate interactive content, including audio and video, images, maps, software code, raw data, and any other media that enhances the traditional written word. The novel platform and concise article format allows for rapid turnaround of submissions. Full peer review assures quality and enables authors to receive appropriate credit for their work. The journal publishes general submissions as well as special collections. Ideas for special collections may be submitted to the editors for consideration.