利用多模态知识实现统一的人工智能药物发现。

Health data science Pub Date : 2024-02-23 eCollection Date: 2024-01-01 DOI:10.34133/hds.0113

Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie

{"title":"利用多模态知识实现统一的人工智能药物发现。","authors":"Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie","doi":"10.34133/hds.0113","DOIUrl":null,"url":null,"abstract":"Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0113"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10886071/pdf/","citationCount":"0","resultStr":"{\"title\":\"Toward Unified AI Drug Discovery with Multimodal Knowledge.\",\"authors\":\"Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie\",\"doi\":\"10.34133/hds.0113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.\",\"PeriodicalId\":73207,\"journal\":{\"name\":\"Health data science\",\"volume\":\"4 \",\"pages\":\"0113\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10886071/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34133/hds.0113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景：在现实世界的药物发现中，人类专家通常从多模态来源掌握药物和蛋白质的分子知识，包括分子结构、知识库中的结构化知识和生物医学文献中的非结构化知识。现有的人工智能药物发现多模态方法独立整合了结构化知识或非结构化知识，影响了对生物分子的整体理解。此外，它们也无法解决缺失模态问题，即新型药物和蛋白质的多模态信息缺失。方法在这项工作中，我们提出了 KEDD--一个统一的端到端深度学习框架，它能将结构化和非结构化知识联合起来，用于庞大的人工智能药物发现任务。该框架首先结合独立的表征学习模型，从每种模式中提取基本特征。然后，它应用特征融合技术来计算预测结果。为了缓解缺失模态问题，我们利用稀疏注意力和模态掩蔽技术，根据顶级相关分子重建缺失特征。结果受益于结构化和非结构化知识，我们的框架加深了对生物分子的理解。在药物-靶点相互作用预测、药物性质预测、药物-药物相互作用预测和蛋白质-蛋白质相互作用预测方面，KEDD的表现分别比最先进的模型平均高出5.2%、2.6%、1.2%和4.1%。通过定性分析，我们揭示了 KEDD 在协助实际应用方面的巨大潜力。结论：通过结合多模态知识中的生物分子专业知识，KEDD有望加速药物发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward Unified AI Drug Discovery with Multimodal Knowledge.

Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Health data science

CiteScore

3.70

自引率

0.00%

发文量