Catalog-based single-channel speech-music separation with the Itakura-Saito divergence

2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO) Pub Date : 2012-10-18 DOI:10.5072/ZENODO.21056

Cemil Demir, A. Cemgil, M. Saraçlar

{"title":"Catalog-based single-channel speech-music separation with the Itakura-Saito divergence","authors":"Cemil Demir, A. Cemgil, M. Saraçlar","doi":"10.5072/ZENODO.21056","DOIUrl":null,"url":null,"abstract":"In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non-Negative Matrix Factorization (NMF) methods. The separation performance is compared using Speech-to-Music Ratio (SMR), Speech-to-Artifact Ratio (SAR) and speech recognition performance measure via the Word Error Rate (WER). We showed that, using IS divergence in both of catalog-based or NMF based speech-music separation methods yields better separation performance than KL divergence. Moreover, in this study, it is shown that catalog-based approaches with both divergence measures outperform traditional NMF based approaches in speech recognition experiments.","PeriodicalId":201182,"journal":{"name":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5072/ZENODO.21056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non-Negative Matrix Factorization (NMF) methods. The separation performance is compared using Speech-to-Music Ratio (SMR), Speech-to-Artifact Ratio (SAR) and speech recognition performance measure via the Word Error Rate (WER). We showed that, using IS divergence in both of catalog-based or NMF based speech-music separation methods yields better separation performance than KL divergence. Moreover, in this study, it is shown that catalog-based approaches with both divergence measures outperform traditional NMF based approaches in speech recognition experiments.

查看原文本刊更多论文

基于目录的单通道语音-音乐分离与Itakura-Saito分歧

在本研究中，我们采用Itakura-Saito (IS)散度度量引入了一种基于目录的单通道语音音乐分离方法。在此之前，我们开发了基于星表的Kullback-Leibler (KL)散度分离方法。从概率的角度来看，IS散度对应于一个复杂的高斯观测模型。比较了基于编目方法和传统的非负矩阵分解(NMF)方法在语音音乐分离任务中的发散度量或观察模型。使用语音与音乐比(SMR)、语音与伪像比(SAR)和单词错误率(WER)衡量语音识别性能来比较分离性能。我们发现，在基于目录或基于NMF的语音音乐分离方法中使用IS散度比KL散度的分离效果更好。此外，本研究还表明，在语音识别实验中，具有两种发散度量的基于目录的方法优于传统的基于NMF的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量