Rapid traversal of vast chemical space using machine learning-guided docking screens.

IF 12 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Nature computational science Pub Date : 2025-04-01 Epub Date: 2025-03-13 DOI:10.1038/s43588-025-00777-x
Andreas Luttens, Israel Cabeza de Vaca, Leonard Sparring, José Brea, Antón Leandro Martínez, Nour Aldin Kahlous, Dmytro S Radchenko, Yurii S Moroz, María Isabel Loza, Ulf Norinder, Jens Carlsson
{"title":"Rapid traversal of vast chemical space using machine learning-guided docking screens.","authors":"Andreas Luttens, Israel Cabeza de Vaca, Leonard Sparring, José Brea, Antón Leandro Martínez, Nour Aldin Kahlous, Dmytro S Radchenko, Yurii S Moroz, María Isabel Loza, Ulf Norinder, Jens Carlsson","doi":"10.1038/s43588-025-00777-x","DOIUrl":null,"url":null,"abstract":"<p><p>The accelerating growth of make-on-demand chemical libraries provides unprecedented opportunities to identify starting points for drug discovery with virtual screening. However, these multi-billion-scale libraries are challenging to screen, even for the fastest structure-based docking methods. Here we explore a strategy that combines machine learning and molecular docking to enable rapid virtual screening of databases containing billions of compounds. In our workflow, a classification algorithm is trained to identify top-scoring compounds based on molecular docking of 1 million compounds to the target protein. The conformal prediction framework is then used to make selections from the multi-billion-scale library, reducing the number of compounds to be scored by docking. The CatBoost classifier showed an optimal balance between speed and accuracy and was used to adapt the workflow for screens of ultralarge libraries. Application to a library of 3.5 billion compounds demonstrated that our protocol can reduce the computational cost of structure-based virtual screening by more than 1,000-fold. Experimental testing of predictions identified ligands of G protein-coupled receptors and demonstrated that our approach enables discovery of compounds with multi-target activity tailored for therapeutic effect.</p>","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":" ","pages":"301-312"},"PeriodicalIF":12.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12021657/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43588-025-00777-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The accelerating growth of make-on-demand chemical libraries provides unprecedented opportunities to identify starting points for drug discovery with virtual screening. However, these multi-billion-scale libraries are challenging to screen, even for the fastest structure-based docking methods. Here we explore a strategy that combines machine learning and molecular docking to enable rapid virtual screening of databases containing billions of compounds. In our workflow, a classification algorithm is trained to identify top-scoring compounds based on molecular docking of 1 million compounds to the target protein. The conformal prediction framework is then used to make selections from the multi-billion-scale library, reducing the number of compounds to be scored by docking. The CatBoost classifier showed an optimal balance between speed and accuracy and was used to adapt the workflow for screens of ultralarge libraries. Application to a library of 3.5 billion compounds demonstrated that our protocol can reduce the computational cost of structure-based virtual screening by more than 1,000-fold. Experimental testing of predictions identified ligands of G protein-coupled receptors and demonstrated that our approach enables discovery of compounds with multi-target activity tailored for therapeutic effect.

使用机器学习引导的对接屏幕快速遍历巨大的化学空间。
按需制造化学文库的加速增长为通过虚拟筛选确定药物发现的起点提供了前所未有的机会。然而,即使使用最快的基于结构的对接方法,也很难对这些数十亿规模的库进行筛选。在这里,我们探索了一种结合机器学习和分子对接的策略,以实现包含数十亿化合物的数据库的快速虚拟筛选。在我们的工作流程中,我们训练了一种分类算法,根据100万种化合物与目标蛋白的分子对接来识别得分最高的化合物。然后使用保形预测框架从数十亿规模的库中进行选择,减少对接需要评分的化合物数量。CatBoost分类器显示了速度和准确性之间的最佳平衡,并用于适应超大库屏幕的工作流程。在一个包含35亿个化合物的文库中的应用表明,我们的方案可以将基于结构的虚拟筛选的计算成本降低1000倍以上。预测的实验测试确定了G蛋白偶联受体的配体,并证明我们的方法能够发现具有多靶点活性的化合物,以达到治疗效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信