减少算法延迟对基于深度学习的降噪的感知效应[j]。

IF 2.1 2区物理与天体物理 Q2 ACOUSTICS

Journal of the Acoustical Society of America Pub Date : 2025-07-01 DOI:10.1121/10.0037197

Eric W Healy, Sarah E Yoho, Kian Fallah, Ashutosh Pandey, DeLiang Wang

{"title":"减少算法延迟对基于深度学习的降噪的感知效应[j]。","authors":"Eric W Healy, Sarah E Yoho, Kian Fallah, Ashutosh Pandey, DeLiang Wang","doi":"10.1121/10.0037197","DOIUrl":null,"url":null,"abstract":"Low latency is an essential requirement for noise reduction in real-world devices such as hearing aids and cochlear implants. Reducing the algorithmic latency of a deep neural network charged with noise reduction allows additional time for other processing. However, a larger analysis window may be advantageous to the performance of the network. This trade-off is currently examined with regard to human speech-intelligibility performance. The algorithmic latency of the attentive recurrent network (ARN) was modified by reducing the size of the analysis time frame. The ARN model was talker, noise, and recording-channel independent, and fully causal. Listeners with hearing loss and with normal hearing heard sentences in babble at various signal-to-noise ratios. Large increases in intelligibility were observed as a result of noise reduction, especially for the listeners with hearing loss and at less favorable signal-to-noise ratios. Slightly larger objective measures of network performance were observed at larger latencies. But more critically, human performance was essentially unchanged as algorithmic latency was reduced from 20 to 10 or 5 ms. These results are discussed in the context of overall design and implementation of deep-learning based noise reduction, and information on latency requirements for human listeners is summarized.","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 1","pages":"380-390"},"PeriodicalIF":2.1000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perceptual effects of reducing algorithmic latency on deep-learning based noise reductiona).\",\"authors\":\"Eric W Healy, Sarah E Yoho, Kian Fallah, Ashutosh Pandey, DeLiang Wang\",\"doi\":\"10.1121/10.0037197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Low latency is an essential requirement for noise reduction in real-world devices such as hearing aids and cochlear implants. Reducing the algorithmic latency of a deep neural network charged with noise reduction allows additional time for other processing. However, a larger analysis window may be advantageous to the performance of the network. This trade-off is currently examined with regard to human speech-intelligibility performance. The algorithmic latency of the attentive recurrent network (ARN) was modified by reducing the size of the analysis time frame. The ARN model was talker, noise, and recording-channel independent, and fully causal. Listeners with hearing loss and with normal hearing heard sentences in babble at various signal-to-noise ratios. Large increases in intelligibility were observed as a result of noise reduction, especially for the listeners with hearing loss and at less favorable signal-to-noise ratios. Slightly larger objective measures of network performance were observed at larger latencies. But more critically, human performance was essentially unchanged as algorithmic latency was reduced from 20 to 10 or 5 ms. These results are discussed in the context of overall design and implementation of deep-learning based noise reduction, and information on latency requirements for human listeners is summarized.\",\"PeriodicalId\":17168,\"journal\":{\"name\":\"Journal of the Acoustical Society of America\",\"volume\":\"158 1\",\"pages\":\"380-390\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Acoustical Society of America\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1121/10.0037197\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0037197","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

摘要

低延迟是现实世界设备（如助听器和人工耳蜗）降噪的基本要求。减少具有降噪功能的深度神经网络的算法延迟可以为其他处理提供额外的时间。然而，更大的分析窗口可能有利于网络的性能。这种权衡目前正在研究人类语言可理解性的表现。通过减小分析时间框架的大小，改进了关注循环网络（ARN）的算法延迟。ARN模型与话音、噪声和录音信道无关，完全因果关系。听力损失的听众和听力正常的听众在不同的信噪比下听到了牙牙学语的句子。由于噪声降低，特别是对于听力损失和信噪比较差的听众，可理解性大幅提高。在更大的延迟下观察到稍大的网络性能客观度量。但更关键的是，当算法延迟从20毫秒减少到10或5毫秒时，人类的表现基本上没有改变。这些结果在基于深度学习的降噪的总体设计和实现的背景下进行了讨论，并总结了人类听众的延迟需求信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Perceptual effects of reducing algorithmic latency on deep-learning based noise reductiona).

Low latency is an essential requirement for noise reduction in real-world devices such as hearing aids and cochlear implants. Reducing the algorithmic latency of a deep neural network charged with noise reduction allows additional time for other processing. However, a larger analysis window may be advantageous to the performance of the network. This trade-off is currently examined with regard to human speech-intelligibility performance. The algorithmic latency of the attentive recurrent network (ARN) was modified by reducing the size of the analysis time frame. The ARN model was talker, noise, and recording-channel independent, and fully causal. Listeners with hearing loss and with normal hearing heard sentences in babble at various signal-to-noise ratios. Large increases in intelligibility were observed as a result of noise reduction, especially for the listeners with hearing loss and at less favorable signal-to-noise ratios. Slightly larger objective measures of network performance were observed at larger latencies. But more critically, human performance was essentially unchanged as algorithmic latency was reduced from 20 to 10 or 5 ms. These results are discussed in the context of overall design and implementation of deep-learning based noise reduction, and information on latency requirements for human listeners is summarized.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Acoustical Society of America 物理-声学

CiteScore

4.60

自引率

16.70%

发文量

1433

审稿时长

4.7 months

期刊介绍： Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.