基于图神经网络的计算药物重新定位与大语言模型参考知识表示。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences Pub Date : 2024-09-26 DOI:10.1007/s12539-024-00654-7

Yaowen Gu, Zidu Xu, Carl Yang

{"title":"基于图神经网络的计算药物重新定位与大语言模型参考知识表示。","authors":"Yaowen Gu, Zidu Xu, Carl Yang","doi":"10.1007/s12539-024-00654-7","DOIUrl":null,"url":null,"abstract":"Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDANode Feat, LLM-DDADual GNN, LLM-DDAGNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDAGNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation.\",\"authors\":\"Yaowen Gu, Zidu Xu, Carl Yang\",\"doi\":\"10.1007/s12539-024-00654-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDANode Feat, LLM-DDADual GNN, LLM-DDAGNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDAGNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.\",\"PeriodicalId\":13670,\"journal\":{\"name\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s12539-024-00654-7\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-024-00654-7","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

通过预测药物-疾病关联（DDA）来计算药物重新定位，为发现新的药物适应症提供了巨大潜力。目前的方法将图神经网络（GNN）纳入药物-疾病异构网络来预测 DDA，与传统的机器学习和矩阵因式分解方法相比，取得了显著的效果。然而，这些方法在很大程度上依赖于网络拓扑结构，受制于不完整和有噪声的网络数据，忽略了大量可用的生物医学知识。与此相对应，大型语言模型（LLM）在图搜索和关系推理方面表现出色，有可能加强将全面的生物医学知识整合到药物和疾病概况中。在本研究中，我们首先研究了 LLM 推断的知识表征在药物重新定位和 DDA 预测中的贡献。我们为 LLM 设计了一个零射提示模板，以提取高质量的药物和疾病实体知识描述，然后通过语言模型的嵌入生成将离散文本转换为连续的数字表示。然后，我们提出了具有三种不同模型架构（LLM-DDANode Feat、LLM-DDADual GNN、LLM-DDAGNN-AE）的 LLM-DDA，以研究基于 LLM 的嵌入的最佳融合模式。在四个 DDA 基准上进行的广泛实验表明，与 11 个基线相比，LLM-DDAGNN-AE 实现了最佳性能，AUPR 整体相对提高了 23.22%，F1-Score 提高了 17.20%，精度提高了 25.35%。同时，涉及泼尼松和过敏性鼻炎的选定案例研究凸显了该模型在现有文献支持下识别可靠的 DDA 和知识描述的能力。这项研究展示了 LLM 在药物重新定位方面的实用性，以及它在其他生物医学关系预测任务中的通用性和适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Empowering Graph Neural Network-Based Computational Drug Repositioning with Large Language Model-Inferred Knowledge Representation.

Computational drug repositioning, through predicting drug-disease associations (DDA), offers significant potential for discovering new drug indications. Current methods incorporate graph neural networks (GNN) on drug-disease heterogeneous networks to predict DDAs, achieving notable performances compared to traditional machine learning and matrix factorization approaches. However, these methods depend heavily on network topology, hampered by incomplete and noisy network data, and overlook the wealth of biomedical knowledge available. Correspondingly, large language models (LLMs) excel in graph search and relational reasoning, which can possibly enhance the integration of comprehensive biomedical knowledge into drug and disease profiles. In this study, we first investigate the contribution of LLM-inferred knowledge representation in drug repositioning and DDA prediction. A zero-shot prompting template was designed for LLM to extract high-quality knowledge descriptions for drug and disease entities, followed by embedding generation from language models to transform the discrete text to continual numerical representation. Then, we proposed LLM-DDA with three different model architectures (LLM-DDA_{Node Feat}, LLM-DDA_{Dual GNN}, LLM-DDA_GNN-AE) to investigate the best fusion mode for LLM-based embeddings. Extensive experiments on four DDA benchmarks show that, LLM-DDA_GNN-AE achieved the optimal performance compared to 11 baselines with the overall relative improvement in AUPR of 23.22%, F1-Score of 17.20%, and precision of 25.35%. Meanwhile, selected case studies of involving Prednisone and Allergic Rhinitis highlighted the model's capability to identify reliable DDAs and knowledge descriptions, supported by existing literature. This study showcases the utility of LLMs in drug repositioning with its generality and applicability in other biomedical relation prediction tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

8.60

自引率

4.20%

发文量

期刊介绍： Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.