Development of a Phrase-Based Speech-Recognition Test Using Synthetic Speech.

IF 2.6 2区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Trends in Hearing Pub Date : 2024-01-01 DOI:10.1177/23312165241261490

Saskia Ibelings, Thomas Brand, Esther Ruigendijk, Inga Holube

{"title":"Development of a Phrase-Based Speech-Recognition Test Using Synthetic Speech.","authors":"Saskia Ibelings, Thomas Brand, Esther Ruigendijk, Inga Holube","doi":"10.1177/23312165241261490","DOIUrl":null,"url":null,"abstract":"<p><p>Speech-recognition tests are widely used in both clinical and research audiology. The purpose of this study was the development of a novel speech-recognition test that combines concepts of different speech-recognition tests to reduce training effects and allows for a large set of speech material. The new test consists of four different words per trial in a meaningful construct with a fixed structure, the so-called phrases. Various free databases were used to select the words and to determine their frequency. Highly frequent nouns were grouped into thematic categories and combined with related adjectives and infinitives. After discarding inappropriate and unnatural combinations, and eliminating duplications of (sub-)phrases, a total number of 772 phrases remained. Subsequently, the phrases were synthesized using a text-to-speech system. The synthesis significantly reduces the effort compared to recordings with a real speaker. After excluding outliers, measured speech-recognition scores for the phrases with 31 normal-hearing participants at fixed signal-to-noise ratios (SNR) revealed speech-recognition thresholds (SRT) for each phrase varying up to 4 dB. The median SRT was -9.1 dB SNR and thus comparable to existing sentence tests. The psychometric function's slope of 15 percentage points per dB is also comparable and enables efficient use in audiology. Summarizing, the principle of creating speech material in a modular system has many potential applications.</p>","PeriodicalId":48678,"journal":{"name":"Trends in Hearing","volume":"28 ","pages":"23312165241261490"},"PeriodicalIF":2.6000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11273571/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trends in Hearing","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/23312165241261490","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Speech-recognition tests are widely used in both clinical and research audiology. The purpose of this study was the development of a novel speech-recognition test that combines concepts of different speech-recognition tests to reduce training effects and allows for a large set of speech material. The new test consists of four different words per trial in a meaningful construct with a fixed structure, the so-called phrases. Various free databases were used to select the words and to determine their frequency. Highly frequent nouns were grouped into thematic categories and combined with related adjectives and infinitives. After discarding inappropriate and unnatural combinations, and eliminating duplications of (sub-)phrases, a total number of 772 phrases remained. Subsequently, the phrases were synthesized using a text-to-speech system. The synthesis significantly reduces the effort compared to recordings with a real speaker. After excluding outliers, measured speech-recognition scores for the phrases with 31 normal-hearing participants at fixed signal-to-noise ratios (SNR) revealed speech-recognition thresholds (SRT) for each phrase varying up to 4 dB. The median SRT was -9.1 dB SNR and thus comparable to existing sentence tests. The psychometric function's slope of 15 percentage points per dB is also comparable and enables efficient use in audiology. Summarizing, the principle of creating speech material in a modular system has many potential applications.

查看原文本刊更多论文

利用合成语音开发基于短语的语音识别测试。

语音识别测试广泛应用于临床和研究听力学领域。本研究的目的是开发一种新的语音识别测试，它结合了不同语音识别测试的概念，以减少训练效果，并允许使用大量的语音材料。新测试由每次试验的四个不同单词组成，这些单词具有固定的结构，即所谓的短语。我们使用各种免费数据库来选择单词并确定其频率。高频名词被归入主题类别，并与相关的形容词和不定式结合在一起。在剔除了不恰当和不自然的组合以及重复的（子）短语后，共剩下 772 个短语。随后，使用文本到语音系统对这些短语进行了合成。与真实说话者的录音相比，合成大大减少了工作量。排除异常值后，在固定信噪比（SNR）条件下对 31 名听力正常的参与者进行的短语语音识别评分显示，每个短语的语音识别阈值（SRT）最高相差 4 分贝。SRT 的中位数为 -9.1 dB SNR，因此可与现有的句子测试相媲美。心理测量函数的斜率为每分贝 15 个百分点，也具有可比性，可在听力学中有效使用。总之，在模块化系统中创建语音材料的原理具有许多潜在的应用价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Trends in Hearing AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGYOTORH-OTORHINOLARYNGOLOGY

CiteScore

4.50

自引率

11.10%

发文量

审稿时长

12 weeks

期刊介绍： Trends in Hearing is an open access journal completely dedicated to publishing original research and reviews focusing on human hearing, hearing loss, hearing aids, auditory implants, and aural rehabilitation. Under its former name, Trends in Amplification, the journal established itself as a forum for concise explorations of all areas of translational hearing research by leaders in the field. Trends in Hearing has now expanded its focus to include original research articles, with the goal of becoming the premier venue for research related to human hearing and hearing loss.