{"title":"CoVar:一种可通用的机器学习方法,用于识别驱动变异基因表达的协调调控因子。","authors":"Satyaki Roy, S. Sheikh, T. Furey","doi":"10.1371/journal.pcbi.1012016","DOIUrl":null,"url":null,"abstract":"Network inference is used to model transcriptional, signaling, and metabolic interactions among genes, proteins, and metabolites that identify biological pathways influencing disease pathogenesis. Advances in machine learning (ML)-based inference models exhibit the predictive capabilities of capturing latent patterns in genomic data. Such models are emerging as an alternative to the statistical models identifying causative factors driving complex diseases. We present CoVar, an ML-based framework that builds upon the properties of existing inference models, to find the central genes driving perturbed gene expression across biological states. Unlike differentially expressed genes (DEGs) that capture changes in individual gene expression across conditions, CoVar focuses on identifying variational genes that undergo changes in their expression network interaction profiles, providing insights into changes in the regulatory dynamics, such as in disease pathogenesis. Subsequently, it finds core genes from among the nearest neighbors of these variational genes, which are central to the variational activity and influence the coordinated regulatory processes underlying the observed changes in gene expression. Through the analysis of simulated as well as yeast expression data perturbed by the deletion of the mitochondrial genome, we show that CoVar captures the intrinsic variationality and modularity in the expression data, identifying key driver genes not found through existing differential analysis methodologies.","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CoVar: A generalizable machine learning approach to identify the coordinated regulators driving variational gene expression.\",\"authors\":\"Satyaki Roy, S. Sheikh, T. Furey\",\"doi\":\"10.1371/journal.pcbi.1012016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network inference is used to model transcriptional, signaling, and metabolic interactions among genes, proteins, and metabolites that identify biological pathways influencing disease pathogenesis. Advances in machine learning (ML)-based inference models exhibit the predictive capabilities of capturing latent patterns in genomic data. Such models are emerging as an alternative to the statistical models identifying causative factors driving complex diseases. We present CoVar, an ML-based framework that builds upon the properties of existing inference models, to find the central genes driving perturbed gene expression across biological states. Unlike differentially expressed genes (DEGs) that capture changes in individual gene expression across conditions, CoVar focuses on identifying variational genes that undergo changes in their expression network interaction profiles, providing insights into changes in the regulatory dynamics, such as in disease pathogenesis. Subsequently, it finds core genes from among the nearest neighbors of these variational genes, which are central to the variational activity and influence the coordinated regulatory processes underlying the observed changes in gene expression. Through the analysis of simulated as well as yeast expression data perturbed by the deletion of the mitochondrial genome, we show that CoVar captures the intrinsic variationality and modularity in the expression data, identifying key driver genes not found through existing differential analysis methodologies.\",\"PeriodicalId\":49688,\"journal\":{\"name\":\"PLoS Computational Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pcbi.1012016\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1012016","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
网络推断用于模拟基因、蛋白质和代谢物之间的转录、信号和代谢相互作用,从而确定影响疾病发病机制的生物通路。基于机器学习(ML)的推理模型取得的进展显示了捕捉基因组数据中潜在模式的预测能力。这些模型正在成为确定复杂疾病致病因素的统计模型的替代方案。我们提出的 CoVar 是一种基于 ML 的框架,它以现有推理模型的特性为基础,用于寻找驱动跨生物状态基因表达扰动的中心基因。与捕捉不同条件下单个基因表达变化的差异表达基因(DEGs)不同,CoVar 侧重于识别表达网络交互图谱发生变化的变异基因,从而深入了解疾病发病机制等调控动态的变化。随后,它从这些变异基因的近邻中找到核心基因,这些基因是变异活动的核心,影响着观察到的基因表达变化背后的协调调控过程。通过分析模拟数据和线粒体基因组缺失扰动的酵母表达数据,我们发现 CoVar 能够捕捉表达数据的内在变异性和模块性,识别出现有差异分析方法无法发现的关键驱动基因。
CoVar: A generalizable machine learning approach to identify the coordinated regulators driving variational gene expression.
Network inference is used to model transcriptional, signaling, and metabolic interactions among genes, proteins, and metabolites that identify biological pathways influencing disease pathogenesis. Advances in machine learning (ML)-based inference models exhibit the predictive capabilities of capturing latent patterns in genomic data. Such models are emerging as an alternative to the statistical models identifying causative factors driving complex diseases. We present CoVar, an ML-based framework that builds upon the properties of existing inference models, to find the central genes driving perturbed gene expression across biological states. Unlike differentially expressed genes (DEGs) that capture changes in individual gene expression across conditions, CoVar focuses on identifying variational genes that undergo changes in their expression network interaction profiles, providing insights into changes in the regulatory dynamics, such as in disease pathogenesis. Subsequently, it finds core genes from among the nearest neighbors of these variational genes, which are central to the variational activity and influence the coordinated regulatory processes underlying the observed changes in gene expression. Through the analysis of simulated as well as yeast expression data perturbed by the deletion of the mitochondrial genome, we show that CoVar captures the intrinsic variationality and modularity in the expression data, identifying key driver genes not found through existing differential analysis methodologies.
期刊介绍:
PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery.
Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines.
Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights.
Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology.
Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.