Automated Speech Audiometry: Can It Work Using Open-Source Pre-Trained Kaldi-NL Automatic Speech Recognition?

IF 3 2区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Trends in Hearing Pub Date : 2024-01-01 DOI:10.1177/23312165241229057

Gloria Araiza-Illan, Luke Meyer, Khiet P Truong, Deniz Başkent

{"title":"Automated Speech Audiometry: Can It Work Using Open-Source Pre-Trained Kaldi-NL Automatic Speech Recognition?","authors":"Gloria Araiza-Illan, Luke Meyer, Khiet P Truong, Deniz Başkent","doi":"10.1177/23312165241229057","DOIUrl":null,"url":null,"abstract":"<p><p>A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where software scores the responses entered by the listener. The test has 24-digit triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19-64 years) completed one DIN + Kaldi-NL test. Their spoken responses were recorded and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0-48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analyzed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"28 ","pages":"23312165241229057"},"PeriodicalIF":3.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10943752/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trends in Hearing","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/23312165241229057","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where software scores the responses entered by the listener. The test has 24-digit triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19-64 years) completed one DIN + Kaldi-NL test. Their spoken responses were recorded and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0-48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analyzed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications.

Abstract Image

查看原文本刊更多论文

自动语音测听：使用开源预训练的 Kaldi-NL 自动语音识别技术是否可行？

噪声中数字（DIN）测试是一种实用的言语测听工具，用于对不同年龄和听力状况的人群进行听力筛查。该测试通常由人工监督员（如临床医生）或在线软件进行，人工监督员会对听者的回答进行评分，在线软件则会对听者输入的回答进行评分。该测试采用自适应阶梯程序呈现 24 位三连音，从而得出语音接收阈值 (SRT)。我们提出了另一种自动 DIN 测试设置，可以在没有人工监督的情况下，使用开源自动语音识别工具包 Kaldi-NL 评估口语回答。30 名自称听力正常的荷兰成年人（19-64 岁）完成了一次 DIN + Kaldi-NL 测试。他们的口语回答被录制下来，用于评估 Kaldi-NL 解码后的回答记录。研究 1 通过 Kaldi-NL 的单词错误率（WER）来评估 Kaldi-NL 的性能，WER 是指与口语回答中出现的数字总数相比，只涉及笔录中出现的数字的解码错误总和所占的百分比。参与者的平均 WER 为 5.0%（范围为 0-48%，SD = 8.8%），每位参与者平均在三个三连音中出现解码错误。研究 2 采用引导模拟法分析了 Kaldi-NL 解码错误的三连音对 DIN 测试输出（SRT）的影响。先前的研究表明，正常听力成人的典型受试者内 SRT 变异为 0.70 dB。研究 2 表明，多达四个三连音解码错误产生的 SRT 变异在此范围内，这表明我们建议的设置在临床应用中是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Trends in Hearing AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGYOTORH-OTORHINOLARYNGOLOGY

CiteScore

4.50

自引率

11.10%

发文量

审稿时长

12 weeks

期刊介绍： Trends in Hearing is an open access journal completely dedicated to publishing original research and reviews focusing on human hearing, hearing loss, hearing aids, auditory implants, and aural rehabilitation. Under its former name, Trends in Amplification, the journal established itself as a forum for concise explorations of all areas of translational hearing research by leaders in the field. Trends in Hearing has now expanded its focus to include original research articles, with the goal of becoming the premier venue for research related to human hearing and hearing loss.