探索室内环境下多语音增强的常规增强和分离方法

IF 1.3 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation and Systems Pub Date : 2021-05-30 DOI:10.1049/ccs2.12023

Yangjie Wei, Ke Zhang, Dan Wu, Zhongqi Hu

{"title":"探索室内环境下多语音增强的常规增强和分离方法","authors":"Yangjie Wei, Ke Zhang, Dan Wu, Zhongqi Hu","doi":"10.1049/ccs2.12023","DOIUrl":null,"url":null,"abstract":"<p>Speech enhancement is an important preprocessing step in a wide diversity of practical fields related to speech signals, and many signal-processing methods have already been proposed for speech enhancement. However, the lack of a comprehensive and quantitative evaluation of enhancement performance for multi-speech makes it difficult to choose an appropriate enhancement method for a multi-speech application. This work aims to study the implementation of several enhancement methods for multi-speech enhancement in indoor environments of T60 = 0 s and T60 = 0.3 s. Two types of enhancement approaches are proposed and compared. The first type is the basic enhancement methods, including delay-and-sum beamforming (DSB), minimum variance distortionless response (MVDR), linearly constrained minimum variance (LCMV), and independent component analysis (ICA). The second type is the robust enhancement methods, including improved MVDR and LCMV realized by eigendecomposition and diagonal loading. In addition, online enhancement performance based on the iteration of single-frame speech signals is researched, as is the comprehensive performance of various enhancement methods. The experimental results show that the enhancement effects of LCMV and ICA are relatively more stable in the case of basic enhancement methods; in the case of the improved enhancement algorithms, methods that employ diagonal loading iterations show better performance. In terms of online enhancement, DSB with frequency masking (FM) yields the best performance on the signal-to-interference ratio (SIR) and can suppress interference. The comprehensive performance test showed that LCMV and ICA yielded the best effects when there was no reverberation, while DSB with FM yielded the best SIR value when reverberation was present.</p>","PeriodicalId":33652,"journal":{"name":"Cognitive Computation and Systems","volume":"3 4","pages":"307-322"},"PeriodicalIF":1.3000,"publicationDate":"2021-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12023","citationCount":"3","resultStr":"{\"title\":\"Exploring conventional enhancement and separation methods for multi-speech enhancement in indoor environments\",\"authors\":\"Yangjie Wei, Ke Zhang, Dan Wu, Zhongqi Hu\",\"doi\":\"10.1049/ccs2.12023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Speech enhancement is an important preprocessing step in a wide diversity of practical fields related to speech signals, and many signal-processing methods have already been proposed for speech enhancement. However, the lack of a comprehensive and quantitative evaluation of enhancement performance for multi-speech makes it difficult to choose an appropriate enhancement method for a multi-speech application. This work aims to study the implementation of several enhancement methods for multi-speech enhancement in indoor environments of T60 = 0 s and T60 = 0.3 s. Two types of enhancement approaches are proposed and compared. The first type is the basic enhancement methods, including delay-and-sum beamforming (DSB), minimum variance distortionless response (MVDR), linearly constrained minimum variance (LCMV), and independent component analysis (ICA). The second type is the robust enhancement methods, including improved MVDR and LCMV realized by eigendecomposition and diagonal loading. In addition, online enhancement performance based on the iteration of single-frame speech signals is researched, as is the comprehensive performance of various enhancement methods. The experimental results show that the enhancement effects of LCMV and ICA are relatively more stable in the case of basic enhancement methods; in the case of the improved enhancement algorithms, methods that employ diagonal loading iterations show better performance. In terms of online enhancement, DSB with frequency masking (FM) yields the best performance on the signal-to-interference ratio (SIR) and can suppress interference. The comprehensive performance test showed that LCMV and ICA yielded the best effects when there was no reverberation, while DSB with FM yielded the best SIR value when reverberation was present.</p>\",\"PeriodicalId\":33652,\"journal\":{\"name\":\"Cognitive Computation and Systems\",\"volume\":\"3 4\",\"pages\":\"307-322\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2021-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12023\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation and Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

摘要

语音增强是语音信号广泛应用领域中重要的预处理步骤，针对语音增强已经提出了许多信号处理方法。然而，由于对多语音增强性能缺乏全面、定量的评价，因此难以为多语音应用选择合适的增强方法。本工作旨在研究几种增强方法在T60 = 0 s和T60 = 0.3 s室内环境下的多语音增强实现。提出并比较了两种增强方法。第一类是基本增强方法，包括延迟和波束形成(DSB)、最小方差无失真响应(MVDR)、线性约束最小方差(LCMV)和独立分量分析(ICA)。第二类是鲁棒增强方法，包括通过特征分解和对角加载实现改进的MVDR和LCMV。此外，还研究了基于单帧语音信号迭代的在线增强性能，以及各种增强方法的综合性能。实验结果表明，在基本增强方法下，LCMV和ICA的增强效果相对更稳定;在改进的增强算法中，采用对角加载迭代的方法表现出更好的性能。在在线增强方面，带频率掩蔽(FM)的DSB在信干扰比(SIR)方面的性能最好，并且可以抑制干扰。综合性能测试表明，LCMV和ICA在无混响情况下的SIR效果最好，而DSB和FM在有混响情况下的SIR效果最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Exploring conventional enhancement and separation methods for multi-speech enhancement in indoor environments

查看原文本刊更多论文

Exploring conventional enhancement and separation methods for multi-speech enhancement in indoor environments

Speech enhancement is an important preprocessing step in a wide diversity of practical fields related to speech signals, and many signal-processing methods have already been proposed for speech enhancement. However, the lack of a comprehensive and quantitative evaluation of enhancement performance for multi-speech makes it difficult to choose an appropriate enhancement method for a multi-speech application. This work aims to study the implementation of several enhancement methods for multi-speech enhancement in indoor environments of T60 = 0 s and T60 = 0.3 s. Two types of enhancement approaches are proposed and compared. The first type is the basic enhancement methods, including delay-and-sum beamforming (DSB), minimum variance distortionless response (MVDR), linearly constrained minimum variance (LCMV), and independent component analysis (ICA). The second type is the robust enhancement methods, including improved MVDR and LCMV realized by eigendecomposition and diagonal loading. In addition, online enhancement performance based on the iteration of single-frame speech signals is researched, as is the comprehensive performance of various enhancement methods. The experimental results show that the enhancement effects of LCMV and ICA are relatively more stable in the case of basic enhancement methods; in the case of the improved enhancement algorithms, methods that employ diagonal loading iterations show better performance. In terms of online enhancement, DSB with frequency masking (FM) yields the best performance on the signal-to-interference ratio (SIR) and can suppress interference. The comprehensive performance test showed that LCMV and ICA yielded the best effects when there was no reverberation, while DSB with FM yielded the best SIR value when reverberation was present.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Computation and Systems Computer Science-Computer Science Applications

CiteScore

2.50

自引率

0.00%

发文量

审稿时长

10 weeks