HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement

IF 2.4 3区 计算机科学 Q2 ACOUSTICS
Nan Li , Meng Ge , Longbiao Wang , Yang-Hao Zhou , Jianwu Dang
{"title":"HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement","authors":"Nan Li ,&nbsp;Meng Ge ,&nbsp;Longbiao Wang ,&nbsp;Yang-Hao Zhou ,&nbsp;Jianwu Dang","doi":"10.1016/j.specom.2024.103161","DOIUrl":null,"url":null,"abstract":"<div><div>Speech enhancement is critical for improving speech quality and intelligibility in a variety of noisy environments. While neural network-based methods have shown promising results in speech enhancement, they often suffer from performance degradation in scenarios with limited computational resources. This paper presents HC-APNet (Harmonic Compensation Auditory Perception Network), a novel lightweight approach tailored to exploit the perceptual capabilities of the human auditory system for efficient and effective speech enhancement, with a focus on harmonic compensation. Inspired by human auditory reception mechanisms, we first segment audio into subbands using an auditory filterbank for speech enhancement. The use of subbands helps to reduce the number of parameters and the computational load, while the use of an auditory filterbank effectively preserves high-quality speech enhancement. In addition, inspired by the perception of human auditory context, we have developed an auditory perception network to capture gain information for different subbands. Furthermore, considering that subband processing only applies gain to the spectral envelope, which may introduce harmonic distortion, we design a learnable multi-subband comb-filter inspired by human pitch frequency perception to mitigate harmonic distortion. Finally, our proposed HC-APNet model achieves competitive performance on the speech quality evaluation metric with significantly less computational and parameter resources compared to existing methods on the VCTK + DEMAND and DNS Challenge datasets.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103161"},"PeriodicalIF":2.4000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324001328","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Speech enhancement is critical for improving speech quality and intelligibility in a variety of noisy environments. While neural network-based methods have shown promising results in speech enhancement, they often suffer from performance degradation in scenarios with limited computational resources. This paper presents HC-APNet (Harmonic Compensation Auditory Perception Network), a novel lightweight approach tailored to exploit the perceptual capabilities of the human auditory system for efficient and effective speech enhancement, with a focus on harmonic compensation. Inspired by human auditory reception mechanisms, we first segment audio into subbands using an auditory filterbank for speech enhancement. The use of subbands helps to reduce the number of parameters and the computational load, while the use of an auditory filterbank effectively preserves high-quality speech enhancement. In addition, inspired by the perception of human auditory context, we have developed an auditory perception network to capture gain information for different subbands. Furthermore, considering that subband processing only applies gain to the spectral envelope, which may introduce harmonic distortion, we design a learnable multi-subband comb-filter inspired by human pitch frequency perception to mitigate harmonic distortion. Finally, our proposed HC-APNet model achieves competitive performance on the speech quality evaluation metric with significantly less computational and parameter resources compared to existing methods on the VCTK + DEMAND and DNS Challenge datasets.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Speech Communication
Speech Communication 工程技术-计算机:跨学科应用
CiteScore
6.80
自引率
6.20%
发文量
94
审稿时长
19.2 weeks
期刊介绍: Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results. The journal''s primary objectives are: • to present a forum for the advancement of human and human-machine speech communication science; • to stimulate cross-fertilization between different fields of this domain; • to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信