Multi-information-aware speech enhancement through self-supervised learning

IF 2.9 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Xiaotong Tu , Jiaxin Xie , Yijin Mao , Yue Huang , Xinghao Ding , Shaogan Ye
{"title":"Multi-information-aware speech enhancement through self-supervised learning","authors":"Xiaotong Tu ,&nbsp;Jiaxin Xie ,&nbsp;Yijin Mao ,&nbsp;Yue Huang ,&nbsp;Xinghao Ding ,&nbsp;Shaogan Ye","doi":"10.1016/j.dsp.2025.105464","DOIUrl":null,"url":null,"abstract":"<div><div>Speech enhancement is a crucial technology aimed at improving the quality and intelligibility of speech signals in noisy environments. Recent advancements in deep neural networks have leveraged abundant clean speech datasets for supervised learning with remarkable results. However, supervised models suffer from poor robustness and generalization due to the scarcity of clean speech data and the complexity of the noise distribution in the real world. In this paper, a self-supervised speech enhancement model, called Multi-Information-Aware Speech Enhancement (MIA-SE), is proposed to address these challenges. A novel self-supervised training strategy is introduced in which denoising is performed on a single input twice, with the first denoiser output being employed as an Implicit Deep Denoiser Prior (IDDP) to supervise the subsequent denoising process. Furthermore, an encoder–decoder denoiser architecture based on a complex ratio masking strategy is incorporated to extract phase and magnitude features simultaneously. To capture sequence context information for improved embedding, transformer modules with multi-head attention mechanisms are integrated within the denoiser. The training process is guided by a newly formulated loss function to ensure successful and effective learning. Experimental results on synthetic and real-world noise databases demonstrate the effectiveness of MIA-SE, particularly in scenarios where paired training data is unavailable.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"168 ","pages":"Article 105464"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425004865","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Speech enhancement is a crucial technology aimed at improving the quality and intelligibility of speech signals in noisy environments. Recent advancements in deep neural networks have leveraged abundant clean speech datasets for supervised learning with remarkable results. However, supervised models suffer from poor robustness and generalization due to the scarcity of clean speech data and the complexity of the noise distribution in the real world. In this paper, a self-supervised speech enhancement model, called Multi-Information-Aware Speech Enhancement (MIA-SE), is proposed to address these challenges. A novel self-supervised training strategy is introduced in which denoising is performed on a single input twice, with the first denoiser output being employed as an Implicit Deep Denoiser Prior (IDDP) to supervise the subsequent denoising process. Furthermore, an encoder–decoder denoiser architecture based on a complex ratio masking strategy is incorporated to extract phase and magnitude features simultaneously. To capture sequence context information for improved embedding, transformer modules with multi-head attention mechanisms are integrated within the denoiser. The training process is guided by a newly formulated loss function to ensure successful and effective learning. Experimental results on synthetic and real-world noise databases demonstrate the effectiveness of MIA-SE, particularly in scenarios where paired training data is unavailable.
基于自监督学习的多信息感知语音增强
语音增强是一项旨在提高噪声环境下语音信号质量和可理解性的关键技术。深度神经网络的最新进展利用了大量干净的语音数据集进行监督学习,并取得了显著的成果。然而,由于真实世界中干净语音数据的稀缺性和噪声分布的复杂性,监督模型的鲁棒性和泛化性较差。本文提出了一种自监督语音增强模型,称为多信息感知语音增强(MIA-SE),以解决这些问题。提出了一种新的自监督训练策略,该策略对单个输入进行两次去噪,第一次去噪输出作为隐式深度去噪先验(IDDP)来监督后续去噪过程。在此基础上,采用基于复比掩蔽策略的编码器-解码器去噪结构,同时提取相位和幅度特征。为了捕获序列上下文信息以改进嵌入,在去噪器中集成了具有多头注意机制的变压器模块。训练过程由一个新制定的损失函数指导,以确保成功和有效的学习。在合成和真实噪声数据库上的实验结果证明了MIA-SE的有效性,特别是在无法获得成对训练数据的情况下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Digital Signal Processing
Digital Signal Processing 工程技术-工程:电子与电气
CiteScore
5.30
自引率
17.20%
发文量
435
审稿时长
66 days
期刊介绍: Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信