Unsupervised co-optimization of a graph neural network and a knowledge graph embedding model to prioritize causal genes for Alzheimers Disease

Archives of clinical and biomedical research Pub Date : 2022-10-06 DOI:10.1101/2022.10.03.22280657

Li-Yu Daisy Liu, V. Prabhakar

{"title":"Unsupervised co-optimization of a graph neural network and a knowledge graph embedding model to prioritize causal genes for Alzheimers Disease","authors":"Li-Yu Daisy Liu, V. Prabhakar","doi":"10.1101/2022.10.03.22280657","DOIUrl":null,"url":null,"abstract":"Data obtained from clinical trials for a given disease often capture reliable empirical features of the highest quality which are limited to few studies/experiments. In contrast, knowledge data extracted from biomedical literature captures a wide range of clinical information relevant to a given disease that may not be as reliable as the experimental data. Therefore, we propose a novel method of training that co-optimizes two AI algorithms on experimental data and knowledge-based information from literature respectively to supplement the learning of one algorithm with that of the other and apply this method to prioritize/rank causal genes for Alzheimer's Disease (AD). One algorithm generates unsupervised embeddings for gene nodes in a protein-protein interaction network associated with experimental data. The other algorithm generates embeddings for the nodes/entities in a knowledge graph constructed from biomedical literature. Both these algorithms are co-optimized to leverage information from each other's domain. Therefore; a downstream inferencing task to rank causal genes for AD ensures the consideration of experimental and literature data available to implicate any given gene in the geneset. Rank-based evaluation metrics computed to validate the gene rankings prioritized by our algorithm showed that the top ranked positions were highly enriched with genes from a ground truth set that were experimentally verified to be causal for the progression of AD. Keywords : Alzheimer's Disease, Causal gene prioritization, Co-optimization, Protein-Protein interaction network, Knowledge Graph","PeriodicalId":72279,"journal":{"name":"Archives of clinical and biomedical research","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of clinical and biomedical research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2022.10.03.22280657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Data obtained from clinical trials for a given disease often capture reliable empirical features of the highest quality which are limited to few studies/experiments. In contrast, knowledge data extracted from biomedical literature captures a wide range of clinical information relevant to a given disease that may not be as reliable as the experimental data. Therefore, we propose a novel method of training that co-optimizes two AI algorithms on experimental data and knowledge-based information from literature respectively to supplement the learning of one algorithm with that of the other and apply this method to prioritize/rank causal genes for Alzheimer's Disease (AD). One algorithm generates unsupervised embeddings for gene nodes in a protein-protein interaction network associated with experimental data. The other algorithm generates embeddings for the nodes/entities in a knowledge graph constructed from biomedical literature. Both these algorithms are co-optimized to leverage information from each other's domain. Therefore; a downstream inferencing task to rank causal genes for AD ensures the consideration of experimental and literature data available to implicate any given gene in the geneset. Rank-based evaluation metrics computed to validate the gene rankings prioritized by our algorithm showed that the top ranked positions were highly enriched with genes from a ground truth set that were experimentally verified to be causal for the progression of AD. Keywords : Alzheimer's Disease, Causal gene prioritization, Co-optimization, Protein-Protein interaction network, Knowledge Graph

查看原文本刊更多论文

图神经网络和知识图嵌入模型的无监督协同优化，以确定阿尔茨海默病因果基因的优先级

从特定疾病的临床试验中获得的数据通常捕捉到最高质量的可靠经验特征，这些特征仅限于少数研究/实验。相反，从生物医学文献中提取的知识数据捕获了与给定疾病相关的广泛临床信息，这些信息可能不如实验数据可靠。因此，我们提出了一种新的训练方法，分别根据实验数据和文献中基于知识的信息对两种人工智能算法进行联合优化，以补充一种算法的学习，并将该方法应用于阿尔茨海默病（AD）的因果基因的优先排序。一种算法为与实验数据相关的蛋白质-蛋白质相互作用网络中的基因节点生成无监督嵌入。另一种算法为根据生物医学文献构建的知识图中的节点/实体生成嵌入。这两种算法都进行了协同优化，以利用来自彼此领域的信息。因此对AD的因果基因进行排序的下游推理任务确保了对可用于暗示基因集中任何给定基因的实验和文献数据的考虑。为验证我们算法优先排序的基因排名而计算的基于排名的评估指标显示，排名靠前的位置高度富集了来自基本事实集的基因，这些基因经实验验证是AD进展的原因。关键词：阿尔茨海默病，因果基因优先排序，协同优化，蛋白质-蛋白质相互作用网络，知识图谱

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Archives of clinical and biomedical research

自引率

0.00%

发文量