Catalog-based single-channel speech-music separation with the Itakura-Saito divergence

Cemil Demir, A. Cemgil, M. Saraçlar
{"title":"Catalog-based single-channel speech-music separation with the Itakura-Saito divergence","authors":"Cemil Demir, A. Cemgil, M. Saraçlar","doi":"10.5072/ZENODO.21056","DOIUrl":null,"url":null,"abstract":"In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non-Negative Matrix Factorization (NMF) methods. The separation performance is compared using Speech-to-Music Ratio (SMR), Speech-to-Artifact Ratio (SAR) and speech recognition performance measure via the Word Error Rate (WER). We showed that, using IS divergence in both of catalog-based or NMF based speech-music separation methods yields better separation performance than KL divergence. Moreover, in this study, it is shown that catalog-based approaches with both divergence measures outperform traditional NMF based approaches in speech recognition experiments.","PeriodicalId":201182,"journal":{"name":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5072/ZENODO.21056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non-Negative Matrix Factorization (NMF) methods. The separation performance is compared using Speech-to-Music Ratio (SMR), Speech-to-Artifact Ratio (SAR) and speech recognition performance measure via the Word Error Rate (WER). We showed that, using IS divergence in both of catalog-based or NMF based speech-music separation methods yields better separation performance than KL divergence. Moreover, in this study, it is shown that catalog-based approaches with both divergence measures outperform traditional NMF based approaches in speech recognition experiments.
基于目录的单通道语音-音乐分离与Itakura-Saito分歧
在本研究中,我们采用Itakura-Saito (IS)散度度量引入了一种基于目录的单通道语音音乐分离方法。在此之前,我们开发了基于星表的Kullback-Leibler (KL)散度分离方法。从概率的角度来看,IS散度对应于一个复杂的高斯观测模型。比较了基于编目方法和传统的非负矩阵分解(NMF)方法在语音音乐分离任务中的发散度量或观察模型。使用语音与音乐比(SMR)、语音与伪像比(SAR)和单词错误率(WER)衡量语音识别性能来比较分离性能。我们发现,在基于目录或基于NMF的语音音乐分离方法中使用IS散度比KL散度的分离效果更好。此外,本研究还表明,在语音识别实验中,具有两种发散度量的基于目录的方法优于传统的基于NMF的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信