Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-05-01 DOI:10.1109/TASL.2013.2239990

H. Sawada, H. Kameoka, S. Araki, N. Ueda

{"title":"Multichannel Extensions of Non-Negative Matrix Factorization With Complex-Valued Data","authors":"H. Sawada, H. Kameoka, S. Araki, N. Ueda","doi":"10.1109/TASL.2013.2239990","DOIUrl":null,"url":null,"abstract":"This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2239990","citationCount":"259","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2239990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 259

Abstract

This paper presents new formulations and algorithms for multichannel extensions of non-negative matrix factorization (NMF). The formulations employ Hermitian positive semidefinite matrices to represent a multichannel version of non-negative elements. Multichannel Euclidean distance and multichannel Itakura-Saito (IS) divergence are defined based on appropriate statistical models utilizing multivariate complex Gaussian distributions. To minimize this distance/divergence, efficient optimization algorithms in the form of multiplicative updates are derived by using properly designed auxiliary functions. Two methods are proposed for clustering NMF bases according to the estimated spatial property. Convolutive blind source separation (BSS) is performed by the multichannel extensions of NMF with the clustering mechanism. Experimental results show that 1) the derived multiplicative update rules exhibited good convergence behavior, and 2) BSS tasks for several music sources with two microphones and three instrumental parts were evaluated successfully.

查看原文本刊更多论文

复值数据非负矩阵分解的多通道扩展

本文提出了非负矩阵分解(NMF)多通道扩展的新公式和新算法。该公式采用厄米正半定矩阵来表示非负元素的多通道版本。多通道欧几里得距离和多通道Itakura-Saito (IS)散度是基于适当的统计模型，利用多元复高斯分布定义的。为了最小化这种距离/散度，通过使用适当设计的辅助函数，以乘法更新的形式推导出有效的优化算法。根据估计的空间特性，提出了两种NMF基聚类方法。卷积盲源分离(BSS)是利用NMF的多通道扩展和聚类机制实现的。实验结果表明:(1)推导的乘法更新规则具有良好的收敛性;(2)成功地评估了具有两个传声器和三个乐器部件的多个音乐源的BSS任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.