Rag2Mol: structure-based drug design based on retrieval augmented generation.

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-05-01 DOI:10.1093/bib/bbaf265

Peidong Zhang, Xingang Peng, Rong Han, Ting Chen, Jianzhu Ma

{"title":"Rag2Mol: structure-based drug design based on retrieval augmented generation.","authors":"Peidong Zhang, Xingang Peng, Rong Han, Ting Chen, Jianzhu Ma","doi":"10.1093/bib/bbaf265","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial intelligence (AI) has brought tremendous progress to drug discovery, yet identifying hit and lead compounds with optimal physicochemical and pharmacological properties remains a significant challenge. Structure-based drug design (SBDD) has emerged as a promising paradigm, but the inherent data biases and ignorance of synthetic accessibility render SBDD models disconnected from practical drug discovery. In this work, we explore two methodologies, Rag2Mol-G and Rag2Mol-R, both based on retrieval-augmented generation to design small molecules to fit a 3D pocket. These two methods involve searching for similar small molecules that are purchasable in the database based on the generated ones or creating new molecules from those in the database that can fit into a 3D pocket. Experimental results demonstrate that Rag2Mol methods consistently produce drug candidates with superior binding affinities and drug-likeness. We find that Rag2Mol-R provides a broader coverage of the chemical landscapes and more precise targeting capability than advanced virtual screening models. Notably, both workflows identified promising inhibitors for the challenging target protein tyrosine phosphatases PTPN2, which was used to be considered undruggable and still lacks inhibitors that have completed full clinical trials. Our highly extensible framework can integrate diverse SBDD methods, marking a significant advancement in AI-driven SBDD.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159289/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf265","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial intelligence (AI) has brought tremendous progress to drug discovery, yet identifying hit and lead compounds with optimal physicochemical and pharmacological properties remains a significant challenge. Structure-based drug design (SBDD) has emerged as a promising paradigm, but the inherent data biases and ignorance of synthetic accessibility render SBDD models disconnected from practical drug discovery. In this work, we explore two methodologies, Rag2Mol-G and Rag2Mol-R, both based on retrieval-augmented generation to design small molecules to fit a 3D pocket. These two methods involve searching for similar small molecules that are purchasable in the database based on the generated ones or creating new molecules from those in the database that can fit into a 3D pocket. Experimental results demonstrate that Rag2Mol methods consistently produce drug candidates with superior binding affinities and drug-likeness. We find that Rag2Mol-R provides a broader coverage of the chemical landscapes and more precise targeting capability than advanced virtual screening models. Notably, both workflows identified promising inhibitors for the challenging target protein tyrosine phosphatases PTPN2, which was used to be considered undruggable and still lacks inhibitors that have completed full clinical trials. Our highly extensible framework can integrate diverse SBDD methods, marking a significant advancement in AI-driven SBDD.

查看原文本刊更多论文

Rag2Mol：基于检索增强生成的基于结构的药物设计。

人工智能（AI）为药物发现带来了巨大的进步，但识别具有最佳物理化学和药理学特性的先导化合物仍然是一个重大挑战。基于结构的药物设计（SBDD）已经成为一种很有前途的范式，但固有的数据偏差和对合成可及性的无知使得SBDD模型与实际药物发现脱节。在这项工作中，我们探索了Rag2Mol-G和Rag2Mol-R两种方法，这两种方法都是基于检索增强生成来设计适合3D口袋的小分子。这两种方法包括根据生成的小分子在数据库中搜索可购买的类似小分子，或者从数据库中创建可以放入3D口袋的新分子。实验结果表明，Rag2Mol方法始终产生具有良好结合亲和力和药物相似性的候选药物。我们发现Rag2Mol-R比先进的虚拟筛选模型提供了更广泛的化学景观覆盖范围和更精确的靶向能力。值得注意的是，这两个工作流程都确定了具有挑战性的靶蛋白酪氨酸磷酸酶PTPN2的有希望的抑制剂，PTPN2过去被认为是不可药物的，仍然缺乏完成完整临床试验的抑制剂。我们高度可扩展的框架可以集成多种SBDD方法，标志着人工智能驱动的SBDD取得了重大进展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.