{"title":"基于目录的单通道语音-音乐分离与Itakura-Saito分歧","authors":"Cemil Demir, A. Cemgil, M. Saraçlar","doi":"10.5072/ZENODO.21056","DOIUrl":null,"url":null,"abstract":"In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non-Negative Matrix Factorization (NMF) methods. The separation performance is compared using Speech-to-Music Ratio (SMR), Speech-to-Artifact Ratio (SAR) and speech recognition performance measure via the Word Error Rate (WER). We showed that, using IS divergence in both of catalog-based or NMF based speech-music separation methods yields better separation performance than KL divergence. Moreover, in this study, it is shown that catalog-based approaches with both divergence measures outperform traditional NMF based approaches in speech recognition experiments.","PeriodicalId":201182,"journal":{"name":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Catalog-based single-channel speech-music separation with the Itakura-Saito divergence\",\"authors\":\"Cemil Demir, A. Cemgil, M. Saraçlar\",\"doi\":\"10.5072/ZENODO.21056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non-Negative Matrix Factorization (NMF) methods. The separation performance is compared using Speech-to-Music Ratio (SMR), Speech-to-Artifact Ratio (SAR) and speech recognition performance measure via the Word Error Rate (WER). We showed that, using IS divergence in both of catalog-based or NMF based speech-music separation methods yields better separation performance than KL divergence. Moreover, in this study, it is shown that catalog-based approaches with both divergence measures outperform traditional NMF based approaches in speech recognition experiments.\",\"PeriodicalId\":201182,\"journal\":{\"name\":\"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5072/ZENODO.21056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5072/ZENODO.21056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Catalog-based single-channel speech-music separation with the Itakura-Saito divergence
In this study, we introduce a catalog-based single-channel speech-music separation method with the Itakura-Saito (IS) divergence measure. Previously, we have developed the catalog-based separation method with the Kullback-Leibler (KL) divergence. In the probabilistic point of view, IS divergence corresponds to a complex Gaussian observation model. Comparison of divergence measures or observation models in speech-music separation task is carried out with both of catalog-based and traditional Non-Negative Matrix Factorization (NMF) methods. The separation performance is compared using Speech-to-Music Ratio (SMR), Speech-to-Artifact Ratio (SAR) and speech recognition performance measure via the Word Error Rate (WER). We showed that, using IS divergence in both of catalog-based or NMF based speech-music separation methods yields better separation performance than KL divergence. Moreover, in this study, it is shown that catalog-based approaches with both divergence measures outperform traditional NMF based approaches in speech recognition experiments.