{"title":"UoI-NMF Cluster: A Robust Nonnegative Matrix Factorization Algorithm for Improved Parts-Based Decomposition and Reconstruction of Noisy Data","authors":"Shashanka Ubaru, Kesheng Wu, K. Bouchard","doi":"10.1109/ICMLA.2017.0-152","DOIUrl":null,"url":null,"abstract":"With the ever growing collection of large volumes of scientific data, development of interpretable machine learning tools to analyze such data is becoming more important. However, robust, interpretable machine learning tools are lacking, threatening extraction of scientific insight and discovery. Nonnegative Matrix Factorization (NMF) algorithms decompose an m × n nonnegative data matrix A into a k × n basis matrix H and an m × k weight matrix W, such that A ≈ WH, where k is the desired rank. In this paper, we present a novel two stage algorithm, UoI-NMF_cluster for NMF, which is based on three innovations: (i) completely separate bases learning from weight estimation, (ii) learn bases by clustering NMF results across bootstrap resamples of the data, and (iii) use the recently introduced Union of Intersections (UoI) framework to estimate ultra-sparse weights that maximize data reconstruction accuracy. We deploy our algorithm on various synthetic and scientific data to illustrate its performance, with a focus on neuroscience data. Compared to other NMF algorithms, UoI-NMF_cluster yields: a) more accurate parts-based decompositions of noisy data, b) a sparse and accurate weight matrix, and c) high accuracy reconstructions of the de-noised data. Together, these improvements enhance the performance and interpretability of NMF application to noisy data, and suggest similar approaches may benefit other matrix decomposition algorithms.","PeriodicalId":6636,"journal":{"name":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"6 1","pages":"241-248"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2017.0-152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
With the ever growing collection of large volumes of scientific data, development of interpretable machine learning tools to analyze such data is becoming more important. However, robust, interpretable machine learning tools are lacking, threatening extraction of scientific insight and discovery. Nonnegative Matrix Factorization (NMF) algorithms decompose an m × n nonnegative data matrix A into a k × n basis matrix H and an m × k weight matrix W, such that A ≈ WH, where k is the desired rank. In this paper, we present a novel two stage algorithm, UoI-NMF_cluster for NMF, which is based on three innovations: (i) completely separate bases learning from weight estimation, (ii) learn bases by clustering NMF results across bootstrap resamples of the data, and (iii) use the recently introduced Union of Intersections (UoI) framework to estimate ultra-sparse weights that maximize data reconstruction accuracy. We deploy our algorithm on various synthetic and scientific data to illustrate its performance, with a focus on neuroscience data. Compared to other NMF algorithms, UoI-NMF_cluster yields: a) more accurate parts-based decompositions of noisy data, b) a sparse and accurate weight matrix, and c) high accuracy reconstructions of the de-noised data. Together, these improvements enhance the performance and interpretability of NMF application to noisy data, and suggest similar approaches may benefit other matrix decomposition algorithms.