Nan Li , Meng Ge , Longbiao Wang , Yang-Hao Zhou , Jianwu Dang
{"title":"HC-APNet: Harmonic Compensation Auditory Perception Network for low-complexity speech enhancement","authors":"Nan Li , Meng Ge , Longbiao Wang , Yang-Hao Zhou , Jianwu Dang","doi":"10.1016/j.specom.2024.103161","DOIUrl":null,"url":null,"abstract":"<div><div>Speech enhancement is critical for improving speech quality and intelligibility in a variety of noisy environments. While neural network-based methods have shown promising results in speech enhancement, they often suffer from performance degradation in scenarios with limited computational resources. This paper presents HC-APNet (Harmonic Compensation Auditory Perception Network), a novel lightweight approach tailored to exploit the perceptual capabilities of the human auditory system for efficient and effective speech enhancement, with a focus on harmonic compensation. Inspired by human auditory reception mechanisms, we first segment audio into subbands using an auditory filterbank for speech enhancement. The use of subbands helps to reduce the number of parameters and the computational load, while the use of an auditory filterbank effectively preserves high-quality speech enhancement. In addition, inspired by the perception of human auditory context, we have developed an auditory perception network to capture gain information for different subbands. Furthermore, considering that subband processing only applies gain to the spectral envelope, which may introduce harmonic distortion, we design a learnable multi-subband comb-filter inspired by human pitch frequency perception to mitigate harmonic distortion. Finally, our proposed HC-APNet model achieves competitive performance on the speech quality evaluation metric with significantly less computational and parameter resources compared to existing methods on the VCTK + DEMAND and DNS Challenge datasets.</div></div>","PeriodicalId":49485,"journal":{"name":"Speech Communication","volume":"167 ","pages":"Article 103161"},"PeriodicalIF":2.4000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167639324001328","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Speech enhancement is critical for improving speech quality and intelligibility in a variety of noisy environments. While neural network-based methods have shown promising results in speech enhancement, they often suffer from performance degradation in scenarios with limited computational resources. This paper presents HC-APNet (Harmonic Compensation Auditory Perception Network), a novel lightweight approach tailored to exploit the perceptual capabilities of the human auditory system for efficient and effective speech enhancement, with a focus on harmonic compensation. Inspired by human auditory reception mechanisms, we first segment audio into subbands using an auditory filterbank for speech enhancement. The use of subbands helps to reduce the number of parameters and the computational load, while the use of an auditory filterbank effectively preserves high-quality speech enhancement. In addition, inspired by the perception of human auditory context, we have developed an auditory perception network to capture gain information for different subbands. Furthermore, considering that subband processing only applies gain to the spectral envelope, which may introduce harmonic distortion, we design a learnable multi-subband comb-filter inspired by human pitch frequency perception to mitigate harmonic distortion. Finally, our proposed HC-APNet model achieves competitive performance on the speech quality evaluation metric with significantly less computational and parameter resources compared to existing methods on the VCTK + DEMAND and DNS Challenge datasets.
期刊介绍:
Speech Communication is an interdisciplinary journal whose primary objective is to fulfil the need for the rapid dissemination and thorough discussion of basic and applied research results.
The journal''s primary objectives are:
• to present a forum for the advancement of human and human-machine speech communication science;
• to stimulate cross-fertilization between different fields of this domain;
• to contribute towards the rapid and wide diffusion of scientifically sound contributions in this domain.