MCC-SP: a powerful integration method for identification of causal pathways from genetic variants to complex disease.

IF 2.9 Q2 Biochemistry, Genetics and Molecular Biology
Yuchen Zhu, Jiadong Ji, Weiqiang Lin, Mingzhuo Li, Lu Liu, Huanhuan Zhu, Fuzhong Xue, Xiujun Li, Xiang Zhou, Zhongshang Yuan
{"title":"MCC-SP: a powerful integration method for identification of causal pathways from genetic variants to complex disease.","authors":"Yuchen Zhu, Jiadong Ji, Weiqiang Lin, Mingzhuo Li, Lu Liu, Huanhuan Zhu, Fuzhong Xue, Xiujun Li, Xiang Zhou, Zhongshang Yuan","doi":"10.1186/s12863-020-00899-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Genome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner.</p><p><strong>Results: </strong>We first introduce the maximal correlation coefficient (MCC) to represent the between-node connection, and then integrate MCC with K shortest paths algorithm to rank and identify the potential pathways from genetic variant to disease. The pathway importance score (PIS) was further provided to quantify the importance of each pathway. We termed this method as \"MCC-SP\". Various simulations are conducted to illustrate MCC is a better measurement of the between-node connection strength than other quantities including Pearson correlation, Spearman correlation, distance correlation, mutual information, and maximal information coefficient. Finally, we applied MCC-SP to analyze one real dataset from the Religious Orders Study and the Memory and Aging Project, and successfully detected 2 typical pathways from APOE genotype to Alzheimer's disease (AD) through gene expression enriched in Alzheimer's disease pathway.</p><p><strong>Conclusions: </strong>MCC-SP has powerful and robust performance in identifying the pathway(s) from the genetic variant to the disease. The source code of MCC-SP is freely available at GitHub ( https://github.com/zhuyuchen95/ADnet ).</p>","PeriodicalId":9197,"journal":{"name":"BMC Genetics","volume":" ","pages":"90"},"PeriodicalIF":2.9000,"publicationDate":"2020-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7477886/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12863-020-00899-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Genome-wide association studies (GWAS) have successfully identified genetic susceptible variants for complex diseases. However, the underlying mechanism of such association remains largely unknown. Most disease-associated genetic variants have been shown to reside in noncoding regions, leading to the hypothesis that regulation of gene expression may be the primary biological mechanism. Current methods to characterize gene expression mediating the effect of genetic variant on diseases, often analyzed one gene at a time and ignored the network structure. The impact of genetic variant can propagate to other genes along the links in the network, then to the final disease. There could be multiple pathways from the genetic variant to the final disease, with each having the chain structure since the first node is one specific SNP (Single Nucleotide Polymorphism) variant and the end is disease outcome. One key but inadequately addressed question is how to measure the between-node connection strength and rank the effects of such chain-type pathways, which can provide statistical evidence to give the priority of some pathways for potential drug development in a cost-effective manner.

Results: We first introduce the maximal correlation coefficient (MCC) to represent the between-node connection, and then integrate MCC with K shortest paths algorithm to rank and identify the potential pathways from genetic variant to disease. The pathway importance score (PIS) was further provided to quantify the importance of each pathway. We termed this method as "MCC-SP". Various simulations are conducted to illustrate MCC is a better measurement of the between-node connection strength than other quantities including Pearson correlation, Spearman correlation, distance correlation, mutual information, and maximal information coefficient. Finally, we applied MCC-SP to analyze one real dataset from the Religious Orders Study and the Memory and Aging Project, and successfully detected 2 typical pathways from APOE genotype to Alzheimer's disease (AD) through gene expression enriched in Alzheimer's disease pathway.

Conclusions: MCC-SP has powerful and robust performance in identifying the pathway(s) from the genetic variant to the disease. The source code of MCC-SP is freely available at GitHub ( https://github.com/zhuyuchen95/ADnet ).

Abstract Image

Abstract Image

Abstract Image

MCC-SP:一种强大的整合方法,用于识别从基因变异到复杂疾病的因果途径。
背景:全基因组关联研究(GWAS全基因组关联研究(GWAS)已成功鉴定出复杂疾病的遗传易感变体。然而,这种关联的内在机制在很大程度上仍然未知。大多数与疾病相关的遗传变异已被证明存在于非编码区,从而提出了基因表达调控可能是主要生物学机制的假设。目前表征基因表达介导遗传变异对疾病影响的方法,往往是一次分析一个基因,而忽略了网络结构。基因变异的影响可以沿着网络中的链接传播到其他基因,进而导致最终的疾病。从基因变异到最终疾病可能有多条路径,每条路径都具有链式结构,因为第一个节点是一个特定的 SNP(单核苷酸多态性)变异,终点是疾病结果。一个关键但尚未得到充分解决的问题是,如何测量节点间的连接强度,并对这种链式通路的效果进行排序,从而提供统计证据,以经济有效的方式优先考虑某些通路的潜在药物开发:我们首先引入最大相关系数(MCC)来表示节点间的联系,然后将最大相关系数与 K 最短路径算法相结合,对从基因变异到疾病的潜在通路进行排序和识别。我们还提供了路径重要性评分(PIS)来量化每条路径的重要性。我们将这种方法称为 "MCC-SP"。我们进行了各种模拟,以说明 MCC 比其他量(包括 Pearson 相关性、Spearman 相关性、距离相关性、互信息和最大信息系数)更能衡量节点间的连接强度。最后,我们应用 MCC-SP 分析了来自宗教仪式研究(Religious Orders Study)和记忆与衰老项目(Memory and Aging Project)的一个真实数据集,并通过富集在阿尔茨海默病通路中的基因表达,成功检测出了从 APOE 基因型到阿尔茨海默病(AD)的两条典型通路:MCC-SP在识别从基因变异到疾病的通路方面具有强大而稳健的性能。MCC-SP 的源代码可在 GitHub ( https://github.com/zhuyuchen95/ADnet ) 上免费获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Genetics
BMC Genetics 生物-遗传学
CiteScore
4.30
自引率
0.00%
发文量
77
审稿时长
4-8 weeks
期刊介绍: BMC Genetics is an open access, peer-reviewed journal that considers articles on all aspects of inheritance and variation in individuals and among populations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信