Stereophonic spectrogram segmentation using Markov random fields

2012 IEEE International Workshop on Machine Learning for Signal Processing Pub Date : 2012-11-12 DOI:10.1109/MLSP.2012.6349754

Minje Kim, P. Smaragdis, Glenn G. Ko, Rob A. Rutenbar

{"title":"Stereophonic spectrogram segmentation using Markov random fields","authors":"Minje Kim, P. Smaragdis, Glenn G. Ko, Rob A. Rutenbar","doi":"10.1109/MLSP.2012.6349754","DOIUrl":null,"url":null,"abstract":"There is a good amount of similarity between source separation approaches that use spectrograms captured from multiple microphones and computer vision algorithms that use multiple images for segmentation problems. Just as one would use Markov random fields (MRF) to solve image segmentation problems, we propose a method of modeling source separation using MRFs, and then solving such problems via common MRF inference methods. To this end, as a preprocessing, we convert stereophonic spectrograms into a integrated form based on their inter-channel level differences (ILD), which is a procedure analogous to getting a disparity map from stereo images for matching problems. Given the ILD matrix as an observed image, we estimate latent labels which stand for the responsibility of each spectrogram's time/frequency bin to a specific sound source. It is shown that the proposed method shows reasonable separation performance in a variety of mixing environments including online separation and moving sources. We expect this new way of formulating source separation problems to help exploit advantages of probabilistic graphical models and the recent advances in low-power, high-performance hardware suited for such tasks.","PeriodicalId":262601,"journal":{"name":"2012 IEEE International Workshop on Machine Learning for Signal Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Workshop on Machine Learning for Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2012.6349754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

There is a good amount of similarity between source separation approaches that use spectrograms captured from multiple microphones and computer vision algorithms that use multiple images for segmentation problems. Just as one would use Markov random fields (MRF) to solve image segmentation problems, we propose a method of modeling source separation using MRFs, and then solving such problems via common MRF inference methods. To this end, as a preprocessing, we convert stereophonic spectrograms into a integrated form based on their inter-channel level differences (ILD), which is a procedure analogous to getting a disparity map from stereo images for matching problems. Given the ILD matrix as an observed image, we estimate latent labels which stand for the responsibility of each spectrogram's time/frequency bin to a specific sound source. It is shown that the proposed method shows reasonable separation performance in a variety of mixing environments including online separation and moving sources. We expect this new way of formulating source separation problems to help exploit advantages of probabilistic graphical models and the recent advances in low-power, high-performance hardware suited for such tasks.

查看原文本刊更多论文

利用马尔科夫随机场的立体声频谱图分割

使用从多个麦克风捕获的频谱图的源分离方法与使用多个图像进行分割问题的计算机视觉算法之间存在大量相似之处。就像使用马尔可夫随机场(MRF)来解决图像分割问题一样，我们提出了一种使用MRF建模源分离的方法，然后通过常见的MRF推理方法来解决这些问题。为此，作为预处理，我们根据声道间电平差(ILD)将立体声声谱图转换为集成形式，这一过程类似于从立体图像中获取视差图以解决匹配问题。将ILD矩阵作为观察到的图像，我们估计潜在标签，这些标签代表每个频谱图的时间/频率bin对特定声源的责任。结果表明，该方法在在线分离和移动源等多种混合环境下均具有较好的分离性能。我们期望这种表述源分离问题的新方法有助于利用概率图形模型的优势，以及适合此类任务的低功耗高性能硬件的最新进展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE International Workshop on Machine Learning for Signal Processing

自引率

0.00%

发文量