A dual-region speech enhancement method based on voiceprint segmentation

IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"A dual-region speech enhancement method based on voiceprint segmentation","authors":"","doi":"10.1016/j.neunet.2024.106683","DOIUrl":null,"url":null,"abstract":"<div><p>Single-channel speech enhancement primarily relies on deep learning models to recover clean speech signals from noise-contaminated speech. These models establish a mapping relationship between noisy and clean speech. However, considering the sparse distribution characteristics of speech energy across the entire time–frequency spectrogram, constructing the mapping relationship from noisy to clean speech exhibits significant differences in regions where speech energy is concentrated and non-concentrated. Utilizing one deep model to simultaneously address these two distinct regression tasks increases the complexity of the mapping relationships, consequently restricting the model’s performance. To validate our hypothesis, we propose a dual-region speech enhancement model based on voiceprint region segmentation. Specifically, we first train a voiceprint segmentation model to classify noisy speech into two regions. Subsequently, we establish dedicated speech enhancement models for each region, with the dual-region models concurrently constructing mapping relationships for noise-corrupted speech to clean speech in distinct regions. Finally, by merging the results, the complete restored speech can be obtained. Experimental results on public datasets demonstrate that our method achieves competitive speech enhancement performance, outperforming the state-of-the-art. Ablation study results confirm the effectiveness of the proposed approach in enhancing model performance.</p></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":null,"pages":null},"PeriodicalIF":6.0000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024006075","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Single-channel speech enhancement primarily relies on deep learning models to recover clean speech signals from noise-contaminated speech. These models establish a mapping relationship between noisy and clean speech. However, considering the sparse distribution characteristics of speech energy across the entire time–frequency spectrogram, constructing the mapping relationship from noisy to clean speech exhibits significant differences in regions where speech energy is concentrated and non-concentrated. Utilizing one deep model to simultaneously address these two distinct regression tasks increases the complexity of the mapping relationships, consequently restricting the model’s performance. To validate our hypothesis, we propose a dual-region speech enhancement model based on voiceprint region segmentation. Specifically, we first train a voiceprint segmentation model to classify noisy speech into two regions. Subsequently, we establish dedicated speech enhancement models for each region, with the dual-region models concurrently constructing mapping relationships for noise-corrupted speech to clean speech in distinct regions. Finally, by merging the results, the complete restored speech can be obtained. Experimental results on public datasets demonstrate that our method achieves competitive speech enhancement performance, outperforming the state-of-the-art. Ablation study results confirm the effectiveness of the proposed approach in enhancing model performance.

基于声纹分割的双区域语音增强方法
单通道语音增强主要依靠深度学习模型从噪声污染的语音中恢复干净的语音信号。这些模型建立了噪声语音和干净语音之间的映射关系。然而,考虑到语音能量在整个时频频谱图中的稀疏分布特性,在语音能量集中和不集中的区域,构建从噪声语音到干净语音的映射关系会表现出显著差异。利用一个深度模型同时处理这两个不同的回归任务会增加映射关系的复杂性,从而限制模型的性能。为了验证我们的假设,我们提出了基于声纹区域分割的双区域语音增强模型。具体来说,我们首先训练一个声纹分割模型,将噪声语音分为两个区域。随后,我们为每个区域建立专门的语音增强模型,双区域模型同时构建噪声干扰语音与不同区域的纯净语音之间的映射关系。最后,通过合并结果,可以得到完整的修复语音。在公共数据集上的实验结果表明,我们的方法实现了有竞争力的语音增强性能,优于最先进的方法。消融研究结果证实了所提方法在增强模型性能方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信