Finding the most potent compounds using active learning on molecular pairs.

IF 2.2 4区 化学 Q2 CHEMISTRY, ORGANIC
Beilstein Journal of Organic Chemistry Pub Date : 2024-08-27 eCollection Date: 2024-01-01 DOI:10.3762/bjoc.20.185
Zachary Fralish, Daniel Reker
{"title":"Finding the most potent compounds using active learning on molecular pairs.","authors":"Zachary Fralish, Daniel Reker","doi":"10.3762/bjoc.20.185","DOIUrl":null,"url":null,"abstract":"<p><p>Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 K<sub>i</sub> benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.</p>","PeriodicalId":8756,"journal":{"name":"Beilstein Journal of Organic Chemistry","volume":"20 ","pages":"2152-2162"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368049/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Beilstein Journal of Organic Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.3762/bjoc.20.185","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}
引用次数: 0

Abstract

Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.

利用分子对的主动学习寻找最有效的化合物。
主动学习允许算法引导迭代实验,以加速分子优化并降低风险,但在项目早期阶段,主动训练的模型仍可能表现出较差的性能,因为在早期阶段,训练数据有限,模型利用可能会导致支架多样性有限的类似物鉴定。在此,我们介绍一种自适应方法 ActiveDelta,它利用成对的分子表征来预测当前最佳训练化合物的改进情况,从而优先考虑进一步的数据采集。我们将 ActiveDelta 概念应用于基于图的深度模型(Chemprop)和基于树的模型(XGBoost),并在 99 个 Ki 基准数据集的探索性主动学习过程中加以应用。我们发现,与 Chemprop、XGBoost 和随机森林的标准探索式主动学习实现相比,ActiveDelta 实现在识别更有效的抑制剂方面表现出色。ActiveDelta 方法还能根据 Murcko 支架识别出化学性质更多样化的抑制剂。最后,在通过 ActiveDelta 方法选取的数据上训练的 Chemprop 等深度模型可以在通过模拟时间分割创建的测试数据中更准确地识别抑制剂。总之,这项研究强调了分子配对方法的巨大潜力,它可以更快、更准确地识别出针对关键药物靶点的更多样化的分子靶点,从而进一步改进低数据量环境中流行的主动学习策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.90
自引率
3.70%
发文量
167
审稿时长
1.4 months
期刊介绍: The Beilstein Journal of Organic Chemistry is an international, peer-reviewed, Open Access journal. It provides a unique platform for rapid publication without any charges (free for author and reader) – Platinum Open Access. The content is freely accessible 365 days a year to any user worldwide. Articles are available online immediately upon publication and are publicly archived in all major repositories. In addition, it provides a platform for publishing thematic issues (theme-based collections of articles) on topical issues in organic chemistry. The journal publishes high quality research and reviews in all areas of organic chemistry, including organic synthesis, organic reactions, natural product chemistry, structural investigations, supramolecular chemistry and chemical biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信