Muhammad Sheryar Fulaly, Sania Gul, Muhammad Salman Khan, Ata ur-Rehman, Syed Waqar Shah
{"title":"On evaluation of dereverberation algorithms for expectation-maximization based binaural source separation in varying echoic conditions","authors":"Muhammad Sheryar Fulaly, Sania Gul, Muhammad Salman Khan, Ata ur-Rehman, Syed Waqar Shah","doi":"10.1109/FIT57066.2022.00045","DOIUrl":null,"url":null,"abstract":"The outcome of source separation (SS) algorithms founded on spatial location cues, degrades in echoic conditions, due to corruption of these cues, that otherwise act as discriminative features for such systems. One of the solutions, for improving the performance of these systems, is to dereverberate the speech mixtures, ahead of the separation process. In this paper, we explore various dereverberation algorithms for preprocessing the reverberant speech mixture signal, before it can be given as an input to the model-based expectation-maximization source separation and localization (MESSL); a SS system based on location cues, working in varying echoic conditions. We then find the most optimum dereverberation algorithm, which can provide significant improvement in quality and intelligibility of the output speech signals from MESSL. It is found that the objective metrics advocate the use of the \"weighted prediction error (WPE)\" algorithm, providing an improvement of 3% in short term objective intelligibility (STOI) and 3.4 dB in signal to distortion ratio (SDR), while the subjective metrics favor the use of the \"precedence effect (PE)\" algorithm, which provides an improvement of 6% in average intelligibility score and 1% in average quality score, over the stand-alone MESSL system.","PeriodicalId":102958,"journal":{"name":"2022 International Conference on Frontiers of Information Technology (FIT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Frontiers of Information Technology (FIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FIT57066.2022.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The outcome of source separation (SS) algorithms founded on spatial location cues, degrades in echoic conditions, due to corruption of these cues, that otherwise act as discriminative features for such systems. One of the solutions, for improving the performance of these systems, is to dereverberate the speech mixtures, ahead of the separation process. In this paper, we explore various dereverberation algorithms for preprocessing the reverberant speech mixture signal, before it can be given as an input to the model-based expectation-maximization source separation and localization (MESSL); a SS system based on location cues, working in varying echoic conditions. We then find the most optimum dereverberation algorithm, which can provide significant improvement in quality and intelligibility of the output speech signals from MESSL. It is found that the objective metrics advocate the use of the "weighted prediction error (WPE)" algorithm, providing an improvement of 3% in short term objective intelligibility (STOI) and 3.4 dB in signal to distortion ratio (SDR), while the subjective metrics favor the use of the "precedence effect (PE)" algorithm, which provides an improvement of 6% in average intelligibility score and 1% in average quality score, over the stand-alone MESSL system.