{"title":"Introducing Bed Word: a new automated speech recognition tool for sociolinguistic interview transcription","authors":"Marcus Ma, Lelia Glass, James Stanford","doi":"10.1515/lingvan-2023-0073","DOIUrl":null,"url":null,"abstract":"We present Bed Word, a tool leveraging industrial automatic speech recognition (ASR) to transcribe sociophonetic data. While we find lower accuracy for minoritized English varieties, the resulting vowel measurements are overall very close to those derived from human-corrected gold data, so fully automated transcription may be suitable for some research purposes. For purposes requiring greater accuracy, we present a pipeline for human post-editing of automatically generated drafts, which we show is far faster than transcribing from scratch. Thus, we offer two ways to leverage ASR in sociolinguistic research: full automation and human post-editing. Augmenting the DARLA tool developed by Reddy and Stanford (2015b. Toward completely automated vowel extraction: Introducing DARLA. <jats:italic>Linguistics Vanguard</jats:italic> 1(1). 15–28), we hope that this resource can help speed up transcription for sociophonetic research.","PeriodicalId":55960,"journal":{"name":"Linguistics Vanguard","volume":"81 1","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistics Vanguard","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/lingvan-2023-0073","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
We present Bed Word, a tool leveraging industrial automatic speech recognition (ASR) to transcribe sociophonetic data. While we find lower accuracy for minoritized English varieties, the resulting vowel measurements are overall very close to those derived from human-corrected gold data, so fully automated transcription may be suitable for some research purposes. For purposes requiring greater accuracy, we present a pipeline for human post-editing of automatically generated drafts, which we show is far faster than transcribing from scratch. Thus, we offer two ways to leverage ASR in sociolinguistic research: full automation and human post-editing. Augmenting the DARLA tool developed by Reddy and Stanford (2015b. Toward completely automated vowel extraction: Introducing DARLA. Linguistics Vanguard 1(1). 15–28), we hope that this resource can help speed up transcription for sociophonetic research.
期刊介绍:
Linguistics Vanguard is a new channel for high quality articles and innovative approaches in all major fields of linguistics. This multimodal journal is published solely online and provides an accessible platform supporting both traditional and new kinds of publications. Linguistics Vanguard seeks to publish concise and up-to-date reports on the state of the art in linguistics as well as cutting-edge research papers. With its topical breadth of coverage and anticipated quick rate of production, it is one of the leading platforms for scientific exchange in linguistics. Its broad theoretical range, international scope, and diversity of article formats engage students and scholars alike. All topics within linguistics are welcome. The journal especially encourages submissions taking advantage of its new multimodal platform designed to integrate interactive content, including audio and video, images, maps, software code, raw data, and any other media that enhances the traditional written word. The novel platform and concise article format allows for rapid turnaround of submissions. Full peer review assures quality and enables authors to receive appropriate credit for their work. The journal publishes general submissions as well as special collections. Ideas for special collections may be submitted to the editors for consideration.