Affine modulation-based audiogram fusion network for joint noise reduction and hearing loss compensation

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-09-13 DOI:10.1016/j.inffus.2025.103726

Ye Ni , Ruiyu Liang , Xiaoshuai Hao , Jiaming Cheng , Qingyun Wang , Chengwei Huang , Cairong Zou , Wei Zhou , Weiping Ding , Björn W. Schuller

{"title":"Affine modulation-based audiogram fusion network for joint noise reduction and hearing loss compensation","authors":"Ye Ni , Ruiyu Liang , Xiaoshuai Hao , Jiaming Cheng , Qingyun Wang , Chengwei Huang , Cairong Zou , Wei Zhou , Weiping Ding , Björn W. Schuller","doi":"10.1016/j.inffus.2025.103726","DOIUrl":null,"url":null,"abstract":"<div><div>Hearing aids (HAs) are widely used to provide personalized speech enhancement (PSE) services, improving the quality of life for individuals with hearing loss. However, HA performance significantly declines in noisy environments as it treats noise reduction (NR) and hearing loss compensation (HLC) as separate tasks. This separation leads to a lack of systematic optimization, overlooking the interactions between these two critical tasks, and increases the system complexity. To address these challenges, we propose a novel audiogram fusion network, named AFN-HearNet, which simultaneously tackles the NR and HLC tasks by fusing cross-domain audiogram and spectrum features. We propose an audiogram-specific encoder that transforms the sparse audiogram profile into a deep representation, addressing the alignment problem of cross-domain features prior to fusion. To incorporate the interactions between NR and HLC tasks, we propose the affine modulation-based audiogram fusion frequency-temporal Conformer that adaptively fuses these two features into a unified deep representation for speech reconstruction. Furthermore, we introduce a voice activity detection auxiliary training task to embed speech and non-speech patterns into the unified deep representation implicitly. We conduct comprehensive experiments across multiple datasets to validate the effectiveness of each proposed module. The results indicate that the AFN-HearNet significantly outperforms state-of-the-art in-context fusion joint models regarding key metrics such as HASQI and PESQ, achieving a considerable trade-off between performance and efficiency. The source code and data will be released at <span><span>https://github.com/deepnetni/AFN-HearNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103726"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525007882","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Hearing aids (HAs) are widely used to provide personalized speech enhancement (PSE) services, improving the quality of life for individuals with hearing loss. However, HA performance significantly declines in noisy environments as it treats noise reduction (NR) and hearing loss compensation (HLC) as separate tasks. This separation leads to a lack of systematic optimization, overlooking the interactions between these two critical tasks, and increases the system complexity. To address these challenges, we propose a novel audiogram fusion network, named AFN-HearNet, which simultaneously tackles the NR and HLC tasks by fusing cross-domain audiogram and spectrum features. We propose an audiogram-specific encoder that transforms the sparse audiogram profile into a deep representation, addressing the alignment problem of cross-domain features prior to fusion. To incorporate the interactions between NR and HLC tasks, we propose the affine modulation-based audiogram fusion frequency-temporal Conformer that adaptively fuses these two features into a unified deep representation for speech reconstruction. Furthermore, we introduce a voice activity detection auxiliary training task to embed speech and non-speech patterns into the unified deep representation implicitly. We conduct comprehensive experiments across multiple datasets to validate the effectiveness of each proposed module. The results indicate that the AFN-HearNet significantly outperforms state-of-the-art in-context fusion joint models regarding key metrics such as HASQI and PESQ, achieving a considerable trade-off between performance and efficiency. The source code and data will be released at https://github.com/deepnetni/AFN-HearNet.

查看原文本刊更多论文

基于仿射调制的听图融合网络联合降噪和听力损失补偿

助听器（HAs）广泛用于提供个性化的语音增强（PSE）服务，改善听力损失患者的生活质量。然而，HA性能在嘈杂环境中显著下降，因为它将降噪（NR）和听力损失补偿（HLC）视为单独的任务。这种分离导致缺乏系统优化，忽略了这两个关键任务之间的交互，并增加了系统的复杂性。为了解决这些挑战，我们提出了一种新的听力图融合网络，名为AFN-HearNet，它通过融合跨域听力图和频谱特征来同时解决NR和HLC任务。我们提出了一种特定于听力图的编码器，它将稀疏的听力图轮廓转换为深度表示，解决了融合之前跨域特征的对齐问题。为了结合NR和HLC任务之间的相互作用，我们提出了基于仿射调制的听力图融合频率-时间共形器，该方法自适应地将这两个特征融合到一个统一的深度表征中，用于语音重建。此外，我们引入了语音活动检测辅助训练任务，将语音和非语音模式隐式嵌入到统一的深度表示中。我们在多个数据集上进行了全面的实验，以验证每个提出模块的有效性。结果表明，AFN-HearNet在关键指标（如HASQI和PESQ）方面明显优于最先进的上下文融合关节模型，在性能和效率之间实现了相当大的权衡。源代码和数据将在https://github.com/deepnetni/AFN-HearNet上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.