利用马尔科夫随机场的立体声频谱图分割

Minje Kim, P. Smaragdis, Glenn G. Ko, Rob A. Rutenbar
{"title":"利用马尔科夫随机场的立体声频谱图分割","authors":"Minje Kim, P. Smaragdis, Glenn G. Ko, Rob A. Rutenbar","doi":"10.1109/MLSP.2012.6349754","DOIUrl":null,"url":null,"abstract":"There is a good amount of similarity between source separation approaches that use spectrograms captured from multiple microphones and computer vision algorithms that use multiple images for segmentation problems. Just as one would use Markov random fields (MRF) to solve image segmentation problems, we propose a method of modeling source separation using MRFs, and then solving such problems via common MRF inference methods. To this end, as a preprocessing, we convert stereophonic spectrograms into a integrated form based on their inter-channel level differences (ILD), which is a procedure analogous to getting a disparity map from stereo images for matching problems. Given the ILD matrix as an observed image, we estimate latent labels which stand for the responsibility of each spectrogram's time/frequency bin to a specific sound source. It is shown that the proposed method shows reasonable separation performance in a variety of mixing environments including online separation and moving sources. We expect this new way of formulating source separation problems to help exploit advantages of probabilistic graphical models and the recent advances in low-power, high-performance hardware suited for such tasks.","PeriodicalId":262601,"journal":{"name":"2012 IEEE International Workshop on Machine Learning for Signal Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Stereophonic spectrogram segmentation using Markov random fields\",\"authors\":\"Minje Kim, P. Smaragdis, Glenn G. Ko, Rob A. Rutenbar\",\"doi\":\"10.1109/MLSP.2012.6349754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a good amount of similarity between source separation approaches that use spectrograms captured from multiple microphones and computer vision algorithms that use multiple images for segmentation problems. Just as one would use Markov random fields (MRF) to solve image segmentation problems, we propose a method of modeling source separation using MRFs, and then solving such problems via common MRF inference methods. To this end, as a preprocessing, we convert stereophonic spectrograms into a integrated form based on their inter-channel level differences (ILD), which is a procedure analogous to getting a disparity map from stereo images for matching problems. Given the ILD matrix as an observed image, we estimate latent labels which stand for the responsibility of each spectrogram's time/frequency bin to a specific sound source. It is shown that the proposed method shows reasonable separation performance in a variety of mixing environments including online separation and moving sources. We expect this new way of formulating source separation problems to help exploit advantages of probabilistic graphical models and the recent advances in low-power, high-performance hardware suited for such tasks.\",\"PeriodicalId\":262601,\"journal\":{\"name\":\"2012 IEEE International Workshop on Machine Learning for Signal Processing\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Workshop on Machine Learning for Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MLSP.2012.6349754\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Workshop on Machine Learning for Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2012.6349754","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

使用从多个麦克风捕获的频谱图的源分离方法与使用多个图像进行分割问题的计算机视觉算法之间存在大量相似之处。就像使用马尔可夫随机场(MRF)来解决图像分割问题一样,我们提出了一种使用MRF建模源分离的方法,然后通过常见的MRF推理方法来解决这些问题。为此,作为预处理,我们根据声道间电平差(ILD)将立体声声谱图转换为集成形式,这一过程类似于从立体图像中获取视差图以解决匹配问题。将ILD矩阵作为观察到的图像,我们估计潜在标签,这些标签代表每个频谱图的时间/频率bin对特定声源的责任。结果表明,该方法在在线分离和移动源等多种混合环境下均具有较好的分离性能。我们期望这种表述源分离问题的新方法有助于利用概率图形模型的优势,以及适合此类任务的低功耗高性能硬件的最新进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Stereophonic spectrogram segmentation using Markov random fields
There is a good amount of similarity between source separation approaches that use spectrograms captured from multiple microphones and computer vision algorithms that use multiple images for segmentation problems. Just as one would use Markov random fields (MRF) to solve image segmentation problems, we propose a method of modeling source separation using MRFs, and then solving such problems via common MRF inference methods. To this end, as a preprocessing, we convert stereophonic spectrograms into a integrated form based on their inter-channel level differences (ILD), which is a procedure analogous to getting a disparity map from stereo images for matching problems. Given the ILD matrix as an observed image, we estimate latent labels which stand for the responsibility of each spectrogram's time/frequency bin to a specific sound source. It is shown that the proposed method shows reasonable separation performance in a variety of mixing environments including online separation and moving sources. We expect this new way of formulating source separation problems to help exploit advantages of probabilistic graphical models and the recent advances in low-power, high-performance hardware suited for such tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信