An ideal compressed mask for increasing speech intelligibility without sacrificing environmental sound recognitiona).

IF 2.1 2区 物理与天体物理 Q2 ACOUSTICS
Eric M Johnson, Eric W Healy
{"title":"An ideal compressed mask for increasing speech intelligibility without sacrificing environmental sound recognitiona).","authors":"Eric M Johnson, Eric W Healy","doi":"10.1121/10.0034599","DOIUrl":null,"url":null,"abstract":"<p><p>Hearing impairment is often characterized by poor speech-in-noise recognition. State-of-the-art laboratory-based noise-reduction technology can eliminate background sounds from a corrupted speech signal and improve intelligibility, but it can also hinder environmental sound recognition (ESR), which is essential for personal independence and safety. This paper presents a time-frequency mask, the ideal compressed mask (ICM), that aims to provide listeners with improved speech intelligibility without substantially reducing ESR. This is accomplished by limiting the maximum attenuation that the mask performs. Speech intelligibility and ESR for hearing-impaired and normal-hearing listeners were measured using stimuli that had been processed by ICMs with various levels of maximum attenuation. This processing resulted in significantly improved intelligibility while retaining high ESR performance for both types of listeners. It was also found that the same level of maximum attenuation provided the optimal balance of intelligibility and ESR for both listener types. It is argued that future deep-learning-based noise reduction algorithms may provide better outcomes by balancing the levels of the target speech and the background environmental sounds, rather than eliminating all signals except for the target speech. The ICM provides one such simple solution for frequency-domain models.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"156 6","pages":"3958-3969"},"PeriodicalIF":2.1000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11646135/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0034599","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Hearing impairment is often characterized by poor speech-in-noise recognition. State-of-the-art laboratory-based noise-reduction technology can eliminate background sounds from a corrupted speech signal and improve intelligibility, but it can also hinder environmental sound recognition (ESR), which is essential for personal independence and safety. This paper presents a time-frequency mask, the ideal compressed mask (ICM), that aims to provide listeners with improved speech intelligibility without substantially reducing ESR. This is accomplished by limiting the maximum attenuation that the mask performs. Speech intelligibility and ESR for hearing-impaired and normal-hearing listeners were measured using stimuli that had been processed by ICMs with various levels of maximum attenuation. This processing resulted in significantly improved intelligibility while retaining high ESR performance for both types of listeners. It was also found that the same level of maximum attenuation provided the optimal balance of intelligibility and ESR for both listener types. It is argued that future deep-learning-based noise reduction algorithms may provide better outcomes by balancing the levels of the target speech and the background environmental sounds, rather than eliminating all signals except for the target speech. The ICM provides one such simple solution for frequency-domain models.

一种理想的压缩掩模,可以在不牺牲环境声音识别的情况下提高语音清晰度。
听力障碍通常表现为对噪声中的言语识别能力差。基于实验室的最先进的降噪技术可以从损坏的语音信号中消除背景声音并提高可理解性,但它也会阻碍环境声音识别(ESR),这对个人独立和安全至关重要。本文提出了一种时频掩码,即理想压缩掩码(ICM),其目的是在不大幅降低ESR的情况下为听者提供更好的语音清晰度。这是通过限制掩码执行的最大衰减来实现的。听力受损者和正常听力者的语音清晰度和ESR是通过不同最大衰减水平的ICMs处理的刺激来测量的。这种处理结果显著提高了可理解性,同时对两种类型的听众都保持了较高的ESR性能。研究还发现,相同的最大衰减水平为两种听者类型提供了可理解性和ESR的最佳平衡。有人认为,未来基于深度学习的降噪算法可能会通过平衡目标语音和背景环境声音的水平来提供更好的结果,而不是消除除目标语音之外的所有信号。ICM为频域模型提供了这样一个简单的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.60
自引率
16.70%
发文量
1433
审稿时长
4.7 months
期刊介绍: Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信