基于均方根振幅最大的50毫秒浊音段的短语音信号（单音节词）定量校正方案

IF 1.2 4区医学 Q3 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Journal of the American Academy of Audiology Pub Date : 2025-03-01 Epub Date: 2025-03-14 DOI:10.3766/jaaa.21126

Richard H Wilson, Nancy J Scherer

{"title":"基于均方根振幅最大的50毫秒浊音段的短语音信号（单音节词）定量校正方案","authors":"Richard H Wilson, Nancy J Scherer","doi":"10.3766/jaaa.21126","DOIUrl":null,"url":null,"abstract":"Background: Since the development of word-recognition materials to test the transmission properties of auditory devices and human auditory systems, a carrier sentence or phrase (e.g., Say the word) has been used to preface the test word. For practical reasons, only the amplitude of the carrier phrase was somewhat controlled. The current American National Standards Institute standard for audiometers continues to specify the level of the test word should be the same communication level as the carrier phrase. Purpose: The development of an amplitude calibration protocol for use with short-duration speech signals that are characterized by substantial amplitude modulations is described. Research Design: Protocol 1 evaluated the average maximum root-mean-square (rms) amplitudes of 12.5-, 25-, 50-, and 100-ms voiced phoneme segments of each test word in 0.0227-ms increments to determine the segment duration to use. Protocol 2 used the 50-ms segment with the maximum rms amplitude among the 200 words in each list to normalize independently the amplitudes of the carrier phrases and test words to a target rms amplitude for each speaker. Study Sample: Digital copies of the 200 monosyllabic words in three versions of Northwestern University Auditory Test No. 6 (NU-6) and one version of the W-22 each spoken by a different speaker were evaluated using the numeric digital values transcribed from the audio files. Two iterations of the protocol were compiled. Data Collection and Analysis: In-house routines were used to analyze the waveform data, the results of which were evaluated with central tendency statistical analyses. Results: The finalized protocol is based on the rms amplitude of a 50-ms segment of the sustained, voiced phoneme of each test word. The protocol directly links the rms amplitudes of the calibration tone and of the 50-ms word segments as opposed to the currently used linking of the calibration tone rms amplitude to a peak meter deflection of the carrier phrase from which the amplitude of the test word is inferred. Conclusions: The effectiveness of the calibration protocol was demonstrated successfully on the four sets of word-recognition materials. The rms amplitude adjustments made independently to the individual carrier phrase and test-word utterances produced overall rms amplitudes for each of the four speakers that were homogenized slightly for the carrier phrases but substantially for many of the test words. Clinical Relevance Statement: The calibration protocol described provides an objective procedure that can be implemented and, most importantly, replicated with numeric accuracy to equate test-word (and carrier phrase) amplitudes among short speech signals like monosyllabic words and among speaker versions of those materials.","PeriodicalId":50021,"journal":{"name":"Journal of the American Academy of Audiology","volume":" ","pages":"68-94"},"PeriodicalIF":1.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445277/pdf/","citationCount":"0","resultStr":"{\"title\":\"A Quantitative Protocol for Calibrating Short Speech Signals (Monosyllabic Words) Based on the 50-ms Segment of the Voiced Phoneme(s) with the Maximum Root-Mean-Square Amplitude.\",\"authors\":\"Richard H Wilson, Nancy J Scherer\",\"doi\":\"10.3766/jaaa.21126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Since the development of word-recognition materials to test the transmission properties of auditory devices and human auditory systems, a carrier sentence or phrase (e.g., Say the word) has been used to preface the test word. For practical reasons, only the amplitude of the carrier phrase was somewhat controlled. The current American National Standards Institute standard for audiometers continues to specify the level of the test word should be the same communication level as the carrier phrase. Purpose: The development of an amplitude calibration protocol for use with short-duration speech signals that are characterized by substantial amplitude modulations is described. Research Design: Protocol 1 evaluated the average maximum root-mean-square (rms) amplitudes of 12.5-, 25-, 50-, and 100-ms voiced phoneme segments of each test word in 0.0227-ms increments to determine the segment duration to use. Protocol 2 used the 50-ms segment with the maximum rms amplitude among the 200 words in each list to normalize independently the amplitudes of the carrier phrases and test words to a target rms amplitude for each speaker. Study Sample: Digital copies of the 200 monosyllabic words in three versions of Northwestern University Auditory Test No. 6 (NU-6) and one version of the W-22 each spoken by a different speaker were evaluated using the numeric digital values transcribed from the audio files. Two iterations of the protocol were compiled. Data Collection and Analysis: In-house routines were used to analyze the waveform data, the results of which were evaluated with central tendency statistical analyses. Results: The finalized protocol is based on the rms amplitude of a 50-ms segment of the sustained, voiced phoneme of each test word. The protocol directly links the rms amplitudes of the calibration tone and of the 50-ms word segments as opposed to the currently used linking of the calibration tone rms amplitude to a peak meter deflection of the carrier phrase from which the amplitude of the test word is inferred. Conclusions: The effectiveness of the calibration protocol was demonstrated successfully on the four sets of word-recognition materials. The rms amplitude adjustments made independently to the individual carrier phrase and test-word utterances produced overall rms amplitudes for each of the four speakers that were homogenized slightly for the carrier phrases but substantially for many of the test words. Clinical Relevance Statement: The calibration protocol described provides an objective procedure that can be implemented and, most importantly, replicated with numeric accuracy to equate test-word (and carrier phrase) amplitudes among short speech signals like monosyllabic words and among speaker versions of those materials.\",\"PeriodicalId\":50021,\"journal\":{\"name\":\"Journal of the American Academy of Audiology\",\"volume\":\" \",\"pages\":\"68-94\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445277/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Academy of Audiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3766/jaaa.21126\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/14 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Audiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3766/jaaa.21126","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/14 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：自从开发了用于测试听觉设备和人类听觉系统传输特性的单词识别材料以来，一个载体句子或短语（例如，Say the word）已被用于测试单词的开头。由于实际原因，只有载波相位的振幅受到某种程度的控制。目前美国国家标准协会的听力计标准继续规定测试词的级别应与载体短语的通信级别相同。目的：描述了用于以大幅度幅度调制为特征的短持续时间语音信号的幅度校准协议的开发。研究设计：方案1以0.0227毫秒的增量评估每个测试单词的12.5、25、50和100毫秒的发音音素片段的平均最大均方根（rms）振幅，以确定要使用的片段持续时间。方案2使用每个列表中200个单词中rmms最大的50 ms片段，将每个说话者的载波短语和测试单词的振幅独立归一化到目标rms振幅。研究样本：使用从音频文件转录的数字数值对西北大学听觉测试6号（NU-6）的三个版本和W-22的一个版本中的200个单音节单词的数字副本进行评估。编译了该协议的两个迭代。数据收集和分析：采用内部程序对波形数据进行分析，并对结果进行集中趋势统计分析。结果：最终方案是基于每个测试单词的持续、浊音音素的50毫秒片段的均方根振幅。该协议直接链接校准音调和50毫秒词段的均方根振幅，而不是目前使用的将校准音调振幅链接到载波短语的峰值仪表偏转，从载波短语中推断出测试词的振幅。结论：在四组文字识别材料上验证了该标定方案的有效性。分别对每个载体短语和测试词的话语进行的均方根振幅调整产生了四个说话者的总体均方根振幅，这些均方根振幅在载体短语中略微均匀，但在许多测试词中却相当均匀。临床相关性声明：所描述的校准方案提供了一个客观的程序，可以实施，最重要的是，可以用数字精度复制，以使短语音信号（如单音节单词）和这些材料的说话人版本之间的测试词（和载波短语）振幅相等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Quantitative Protocol for Calibrating Short Speech Signals (Monosyllabic Words) Based on the 50-ms Segment of the Voiced Phoneme(s) with the Maximum Root-Mean-Square Amplitude.

Background: Since the development of word-recognition materials to test the transmission properties of auditory devices and human auditory systems, a carrier sentence or phrase (e.g., Say the word) has been used to preface the test word. For practical reasons, only the amplitude of the carrier phrase was somewhat controlled. The current American National Standards Institute standard for audiometers continues to specify the level of the test word should be the same communication level as the carrier phrase. Purpose: The development of an amplitude calibration protocol for use with short-duration speech signals that are characterized by substantial amplitude modulations is described. Research Design: Protocol 1 evaluated the average maximum root-mean-square (rms) amplitudes of 12.5-, 25-, 50-, and 100-ms voiced phoneme segments of each test word in 0.0227-ms increments to determine the segment duration to use. Protocol 2 used the 50-ms segment with the maximum rms amplitude among the 200 words in each list to normalize independently the amplitudes of the carrier phrases and test words to a target rms amplitude for each speaker. Study Sample: Digital copies of the 200 monosyllabic words in three versions of Northwestern University Auditory Test No. 6 (NU-6) and one version of the W-22 each spoken by a different speaker were evaluated using the numeric digital values transcribed from the audio files. Two iterations of the protocol were compiled. Data Collection and Analysis: In-house routines were used to analyze the waveform data, the results of which were evaluated with central tendency statistical analyses. Results: The finalized protocol is based on the rms amplitude of a 50-ms segment of the sustained, voiced phoneme of each test word. The protocol directly links the rms amplitudes of the calibration tone and of the 50-ms word segments as opposed to the currently used linking of the calibration tone rms amplitude to a peak meter deflection of the carrier phrase from which the amplitude of the test word is inferred. Conclusions: The effectiveness of the calibration protocol was demonstrated successfully on the four sets of word-recognition materials. The rms amplitude adjustments made independently to the individual carrier phrase and test-word utterances produced overall rms amplitudes for each of the four speakers that were homogenized slightly for the carrier phrases but substantially for many of the test words. Clinical Relevance Statement: The calibration protocol described provides an objective procedure that can be implemented and, most importantly, replicated with numeric accuracy to equate test-word (and carrier phrase) amplitudes among short speech signals like monosyllabic words and among speaker versions of those materials.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the American Academy of Audiology 医学-耳鼻喉科学

CiteScore

3.10

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The Journal of the American Academy of Audiology (JAAA) is the Academy''s scholarly peer-reviewed publication, issued 10 times per year and available to Academy members as a benefit of membership. The JAAA publishes articles and clinical reports in all areas of audiology, including audiological assessment, amplification, aural habilitation and rehabilitation, auditory electrophysiology, vestibular assessment, and hearing science.