UoI-NMF Cluster: A Robust Nonnegative Matrix Factorization Algorithm for Improved Parts-Based Decomposition and Reconstruction of Noisy Data

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2017-12-01 DOI:10.1109/ICMLA.2017.0-152

Shashanka Ubaru, Kesheng Wu, K. Bouchard

{"title":"UoI-NMF Cluster: A Robust Nonnegative Matrix Factorization Algorithm for Improved Parts-Based Decomposition and Reconstruction of Noisy Data","authors":"Shashanka Ubaru, Kesheng Wu, K. Bouchard","doi":"10.1109/ICMLA.2017.0-152","DOIUrl":null,"url":null,"abstract":"With the ever growing collection of large volumes of scientific data, development of interpretable machine learning tools to analyze such data is becoming more important. However, robust, interpretable machine learning tools are lacking, threatening extraction of scientific insight and discovery. Nonnegative Matrix Factorization (NMF) algorithms decompose an m × n nonnegative data matrix A into a k × n basis matrix H and an m × k weight matrix W, such that A ≈ WH, where k is the desired rank. In this paper, we present a novel two stage algorithm, UoI-NMF_cluster for NMF, which is based on three innovations: (i) completely separate bases learning from weight estimation, (ii) learn bases by clustering NMF results across bootstrap resamples of the data, and (iii) use the recently introduced Union of Intersections (UoI) framework to estimate ultra-sparse weights that maximize data reconstruction accuracy. We deploy our algorithm on various synthetic and scientific data to illustrate its performance, with a focus on neuroscience data. Compared to other NMF algorithms, UoI-NMF_cluster yields: a) more accurate parts-based decompositions of noisy data, b) a sparse and accurate weight matrix, and c) high accuracy reconstructions of the de-noised data. Together, these improvements enhance the performance and interpretability of NMF application to noisy data, and suggest similar approaches may benefit other matrix decomposition algorithms.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"6 1","pages":"241-248"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2017.0-152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

With the ever growing collection of large volumes of scientific data, development of interpretable machine learning tools to analyze such data is becoming more important. However, robust, interpretable machine learning tools are lacking, threatening extraction of scientific insight and discovery. Nonnegative Matrix Factorization (NMF) algorithms decompose an m × n nonnegative data matrix A into a k × n basis matrix H and an m × k weight matrix W, such that A ≈ WH, where k is the desired rank. In this paper, we present a novel two stage algorithm, UoI-NMF_cluster for NMF, which is based on three innovations: (i) completely separate bases learning from weight estimation, (ii) learn bases by clustering NMF results across bootstrap resamples of the data, and (iii) use the recently introduced Union of Intersections (UoI) framework to estimate ultra-sparse weights that maximize data reconstruction accuracy. We deploy our algorithm on various synthetic and scientific data to illustrate its performance, with a focus on neuroscience data. Compared to other NMF algorithms, UoI-NMF_cluster yields: a) more accurate parts-based decompositions of noisy data, b) a sparse and accurate weight matrix, and c) high accuracy reconstructions of the de-noised data. Together, these improvements enhance the performance and interpretability of NMF application to noisy data, and suggest similar approaches may benefit other matrix decomposition algorithms.

查看原文本刊更多论文

UoI-NMF聚类:一种改进的基于部件的噪声数据分解与重构鲁棒非负矩阵分解算法

随着大量科学数据的不断收集，开发可解释的机器学习工具来分析这些数据变得越来越重要。然而，缺乏强大的、可解释的机器学习工具，威胁着科学见解和发现的提取。非负矩阵分解(NMF)算法将一个m × n的非负数据矩阵A分解为一个k × n的基矩阵H和一个m × k的权矩阵W，使得A≈WH，其中k为期望秩。在本文中，我们提出了一种新的两阶段算法，用于NMF的UoI- nmf_cluster，它基于三个创新:(i)完全将基学习与权估计分开，(ii)通过跨数据的bootstrap重样本聚类NMF结果来学习基，以及(iii)使用最近引入的交集并(UoI)框架来估计超稀疏权值，从而最大化数据重建精度。我们将算法部署在各种合成和科学数据上以说明其性能，重点是神经科学数据。与其他NMF算法相比，UoI-NMF_cluster产生:a)更精确的基于部件的噪声数据分解，b)稀疏而准确的权重矩阵，以及c)高精度的去噪数据重建。总之，这些改进提高了NMF应用于噪声数据的性能和可解释性，并表明类似的方法可能有利于其他矩阵分解算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量