New research on monaural speech segregation based on quality assessment

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xiaoping Xie, Can Li, Dan Tian, Rufeng Shen, Fei Ding
{"title":"New research on monaural speech segregation based on quality assessment","authors":"Xiaoping Xie,&nbsp;Can Li,&nbsp;Dan Tian,&nbsp;Rufeng Shen,&nbsp;Fei Ding","doi":"10.1016/j.csl.2023.101601","DOIUrl":null,"url":null,"abstract":"<div><p>Speech enhancement (SE) is a pivotal technology in enhancing the quality and intelligibility of speech signals. Nevertheless, when processing speech signals under conditions of high signal-to-noise ratio (SNR), conventional SE techniques may inadvertently lead to a diminution in the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). This article introduces the innovative incorporation of the Non-Intrusive Speech Quality Assessment (NISQA) algorithm into SE systems. Through the comparison of pre and post-enhancement speech quality scores, it discerns whether the speech signal under consideration warrants enhancement processing, thereby mitigating potential deterioration in PESQ and STOI. Furthermore, this study delves into the ramifications of five prevalent speech features, namely, Mel Frequency Cepstral Coefficients<span> (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), Relative Spectral Trans-formed Perceptual Linear Prediction coefficients (RASTA-PLP), Amplitude Modulation<span> Spectrogram<span> (AMS), and Multi-Resolution Cochleagram (MRCG), on PESQ and STOI under varying noise conditions. Experimental outcomes underscore that MRCG consistently emerges as the optimal and most stable feature for STOI, while the feature yielding the highest PESQ score exhibits intricate correlations with the background noise type, SNR level, and noise compatibility with the speech signal. Consequently, we propose an SE methodology founded on quality assessment and feature selection, facilitating the adaptive selection of optimal features tailored to distinct background noise scenarios, thereby always maintain the highest caliber enhancement effect with regard to PESQ metrics.</span></span></span></p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230823001201","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Speech enhancement (SE) is a pivotal technology in enhancing the quality and intelligibility of speech signals. Nevertheless, when processing speech signals under conditions of high signal-to-noise ratio (SNR), conventional SE techniques may inadvertently lead to a diminution in the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). This article introduces the innovative incorporation of the Non-Intrusive Speech Quality Assessment (NISQA) algorithm into SE systems. Through the comparison of pre and post-enhancement speech quality scores, it discerns whether the speech signal under consideration warrants enhancement processing, thereby mitigating potential deterioration in PESQ and STOI. Furthermore, this study delves into the ramifications of five prevalent speech features, namely, Mel Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), Relative Spectral Trans-formed Perceptual Linear Prediction coefficients (RASTA-PLP), Amplitude Modulation Spectrogram (AMS), and Multi-Resolution Cochleagram (MRCG), on PESQ and STOI under varying noise conditions. Experimental outcomes underscore that MRCG consistently emerges as the optimal and most stable feature for STOI, while the feature yielding the highest PESQ score exhibits intricate correlations with the background noise type, SNR level, and noise compatibility with the speech signal. Consequently, we propose an SE methodology founded on quality assessment and feature selection, facilitating the adaptive selection of optimal features tailored to distinct background noise scenarios, thereby always maintain the highest caliber enhancement effect with regard to PESQ metrics.

基于质量评价的单音语音分离新研究
语音增强技术是提高语音信号质量和可理解性的关键技术。然而,当处理高信噪比(SNR)条件下的语音信号时,传统的SE技术可能会无意中导致语音质量(PESQ)和短时客观可理解性(STOI)的感知评价的降低。本文介绍了将非侵入式语音质量评估(NISQA)算法创新性地整合到语音识别系统中。通过比较增强前和增强后的语音质量分数,可以判断所考虑的语音信号是否需要进行增强处理,从而减轻PESQ和STOI的潜在恶化。此外,本研究还探讨了五种常见的语音特征,即Mel频率倒谱系数(MFCC)、gamma酮频率倒谱系数(GFCC)、相对频谱变换感知线性预测系数(RASTA-PLP)、调幅谱图(AMS)和多分辨率耳蜗图(MRCG)在不同噪声条件下对PESQ和STOI的影响。实验结果强调,MRCG始终是STOI的最佳和最稳定的特征,而产生最高PESQ分数的特征与背景噪声类型、信噪比水平以及与语音信号的噪声兼容性表现出复杂的相关性。因此,我们提出了一种基于质量评估和特征选择的SE方法,促进了针对不同背景噪声场景的最佳特征的自适应选择,从而始终保持最高水平的PESQ指标增强效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信