{"title":"Gene expression dissection by non-negative well-grounded source separation","authors":"Yitan Zhu, Tsung-Han Chan, E. Hoffman, Y. Wang","doi":"10.1109/MLSP.2008.4685489","DOIUrl":null,"url":null,"abstract":"A linear mixture model of non-negative sources is used to dissect the gene expression data into components that are putative underlying active biological processes. Each biological process/component is characterized by its specific genes that are exclusively highly expressed in it and expected to be functional enriched; while a majority of all the genes maintain basic cellular structure and functions to support these specific genes and thus are roughly commonly expressed across all components. Such components form non-negative well-grounded, but dependent and non-sparse sources in the model. The unique identifiability of the model is proved. A blind source separation method utilizing convex analysis and sector-based clustering is developed with stability analysis based model order selection scheme to identify the components and their activity curves. When applied on muscle regeneration data, our method revealed four underlying active biological processes associated with four successive phases in muscle regeneration.","PeriodicalId":447191,"journal":{"name":"2008 IEEE Workshop on Machine Learning for Signal Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE Workshop on Machine Learning for Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2008.4685489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
A linear mixture model of non-negative sources is used to dissect the gene expression data into components that are putative underlying active biological processes. Each biological process/component is characterized by its specific genes that are exclusively highly expressed in it and expected to be functional enriched; while a majority of all the genes maintain basic cellular structure and functions to support these specific genes and thus are roughly commonly expressed across all components. Such components form non-negative well-grounded, but dependent and non-sparse sources in the model. The unique identifiability of the model is proved. A blind source separation method utilizing convex analysis and sector-based clustering is developed with stability analysis based model order selection scheme to identify the components and their activity curves. When applied on muscle regeneration data, our method revealed four underlying active biological processes associated with four successive phases in muscle regeneration.