iHerd: an integrative hierarchical graph representation learning framework to quantify network changes and prioritize risk genes in disease.

IF 4.3 2区生物学

PLoS Computational Biology Pub Date : 2023-09-11 eCollection Date: 2023-09-01 DOI:10.1371/journal.pcbi.1011444

Ziheng Duan, Yi Dai, Ahyeon Hwang, Cheyu Lee, Kaichi Xie, Chutong Xiao, Min Xu, Matthew J Girgenti, Jing Zhang

{"title":"iHerd: an integrative hierarchical graph representation learning framework to quantify network changes and prioritize risk genes in disease.","authors":"Ziheng Duan, Yi Dai, Ahyeon Hwang, Cheyu Lee, Kaichi Xie, Chutong Xiao, Min Xu, Matthew J Girgenti, Jing Zhang","doi":"10.1371/journal.pcbi.1011444","DOIUrl":null,"url":null,"abstract":"<p><p>Different genes form complex networks within cells to carry out critical cellular functions, while network alterations in this process can potentially introduce downstream transcriptome perturbations and phenotypic variations. Therefore, developing efficient and interpretable methods to quantify network changes and pinpoint driver genes across conditions is crucial. We propose a hierarchical graph representation learning method, called iHerd. Given a set of networks, iHerd first hierarchically generates a series of coarsened sub-graphs in a data-driven manner, representing network modules at different resolutions (e.g., the level of signaling pathways). Then, it sequentially learns low-dimensional node representations at all hierarchical levels via efficient graph embedding. Lastly, iHerd projects separate gene embeddings onto the same latent space in its graph alignment module to calculate a rewiring index for driver gene prioritization. To demonstrate its effectiveness, we applied iHerd on a tumor-to-normal GRN rewiring analysis and cell-type-specific GCN analysis using single-cell multiome data of the brain. We showed that iHerd can effectively pinpoint novel and well-known risk genes in different diseases. Distinct from existing models, iHerd's graph coarsening for hierarchical learning allows us to successfully classify network driver genes into early and late divergent genes (EDGs and LDGs), emphasizing genes with extensive network changes across and within signaling pathway levels. This unique approach for driver gene classification can provide us with deeper molecular insights. The code is freely available at https://github.com/aicb-ZhangLabs/iHerd. All other relevant data are within the manuscript and supporting information files.</p>","PeriodicalId":49688,"journal":{"name":"PLoS Computational Biology","volume":"19 9","pages":"e1011444"},"PeriodicalIF":4.3000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10513318/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1011444","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Different genes form complex networks within cells to carry out critical cellular functions, while network alterations in this process can potentially introduce downstream transcriptome perturbations and phenotypic variations. Therefore, developing efficient and interpretable methods to quantify network changes and pinpoint driver genes across conditions is crucial. We propose a hierarchical graph representation learning method, called iHerd. Given a set of networks, iHerd first hierarchically generates a series of coarsened sub-graphs in a data-driven manner, representing network modules at different resolutions (e.g., the level of signaling pathways). Then, it sequentially learns low-dimensional node representations at all hierarchical levels via efficient graph embedding. Lastly, iHerd projects separate gene embeddings onto the same latent space in its graph alignment module to calculate a rewiring index for driver gene prioritization. To demonstrate its effectiveness, we applied iHerd on a tumor-to-normal GRN rewiring analysis and cell-type-specific GCN analysis using single-cell multiome data of the brain. We showed that iHerd can effectively pinpoint novel and well-known risk genes in different diseases. Distinct from existing models, iHerd's graph coarsening for hierarchical learning allows us to successfully classify network driver genes into early and late divergent genes (EDGs and LDGs), emphasizing genes with extensive network changes across and within signaling pathway levels. This unique approach for driver gene classification can provide us with deeper molecular insights. The code is freely available at https://github.com/aicb-ZhangLabs/iHerd. All other relevant data are within the manuscript and supporting information files.

Abstract Image

查看原文本刊更多论文

iHerd：一个综合层次图表示学习框架，用于量化网络变化并优先考虑疾病中的风险基因。

不同的基因在细胞内形成复杂的网络来执行关键的细胞功能，而这一过程中的网络改变可能会引入下游转录组扰动和表型变异。因此，开发有效且可解释的方法来量化网络变化并在不同条件下精确定位驱动基因至关重要。我们提出了一种分层图表示学习方法，称为iHerd。给定一组网络，iHerd首先以数据驱动的方式分层生成一系列粗化子图，以不同的分辨率（例如，信号通路的级别）表示网络模块。然后，它通过有效的图嵌入顺序地学习所有层次级别的低维节点表示。最后，iHerd在其图形对齐模块中将单独的基因嵌入投影到相同的潜在空间上，以计算驱动基因优先级的重新布线指数。为了证明其有效性，我们使用大脑的单细胞多组数据，将iHerd应用于肿瘤的正常GRN重新布线分析和细胞类型特异性GCN分析。我们发现iHerd可以有效地定位不同疾病中的新的和众所周知的风险基因。与现有模型不同，iHerd用于分层学习的图粗化使我们能够成功地将网络驱动基因分为早期和晚期分化基因（EDG和LDG），强调在信号通路水平上和信号通路水平内具有广泛网络变化的基因。这种独特的驱动基因分类方法可以为我们提供更深入的分子见解。该代码可在https://github.com/aicb-ZhangLabs/iHerd.所有其他相关数据都在手稿和支持信息文件中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS Computational Biology 生物-生化研究方法

CiteScore

7.10

自引率

4.70%

发文量

820

期刊介绍： PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.