{"title":"Blind speech separation exploiting temporal and spectral correlations using 2D-HMMs","authors":"Dang Hai Tran Vu, Reinhold Häb-Umbach","doi":"10.5281/ZENODO.43429","DOIUrl":null,"url":null,"abstract":"We present a novel method to exploit correlations of adjacent time-frequency (TF)-slots for a sparseness-based blind speech separation (BSS) system. Usually, these correlations are exploited by some heuristic smoothing techniques in the post-processing of the estimated soft TF masks. We propose a different approach: Based on our previous work with one-dimensional (1D)-hidden Markov models (HMMs) along the time axis we extend the modeling to two-dimensional (2D)-HMMs to exploit both temporal and spectral correlations in the speech signal. Based on the principles of turbo decoding we solved the complex inference of 2D-HMMs by a modified forward-backward algorithm which operates alternatingly along the time and the frequency axis. Extrinsic information is exchanged between these steps such that increasingly better soft time-frequency masks are obtained, leading to improved speech separation performance in highly reverberant recording conditions.","PeriodicalId":400766,"journal":{"name":"21st European Signal Processing Conference (EUSIPCO 2013)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st European Signal Processing Conference (EUSIPCO 2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.43429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
We present a novel method to exploit correlations of adjacent time-frequency (TF)-slots for a sparseness-based blind speech separation (BSS) system. Usually, these correlations are exploited by some heuristic smoothing techniques in the post-processing of the estimated soft TF masks. We propose a different approach: Based on our previous work with one-dimensional (1D)-hidden Markov models (HMMs) along the time axis we extend the modeling to two-dimensional (2D)-HMMs to exploit both temporal and spectral correlations in the speech signal. Based on the principles of turbo decoding we solved the complex inference of 2D-HMMs by a modified forward-backward algorithm which operates alternatingly along the time and the frequency axis. Extrinsic information is exchanged between these steps such that increasingly better soft time-frequency masks are obtained, leading to improved speech separation performance in highly reverberant recording conditions.