利用分子对的主动学习寻找最有效的化合物。

IF 2.2 4区化学 Q2 CHEMISTRY, ORGANIC

Beilstein Journal of Organic Chemistry Pub Date : 2024-08-27 eCollection Date: 2024-01-01 DOI:10.3762/bjoc.20.185

Zachary Fralish, Daniel Reker

{"title":"利用分子对的主动学习寻找最有效的化合物。","authors":"Zachary Fralish, Daniel Reker","doi":"10.3762/bjoc.20.185","DOIUrl":null,"url":null,"abstract":"Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.","PeriodicalId":8756,"journal":{"name":"Beilstein Journal of Organic Chemistry","volume":"20 ","pages":"2152-2162"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368049/pdf/","citationCount":"0","resultStr":"{\"title\":\"Finding the most potent compounds using active learning on molecular pairs.\",\"authors\":\"Zachary Fralish, Daniel Reker\",\"doi\":\"10.3762/bjoc.20.185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.\",\"PeriodicalId\":8756,\"journal\":{\"name\":\"Beilstein Journal of Organic Chemistry\",\"volume\":\"20 \",\"pages\":\"2152-2162\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368049/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Beilstein Journal of Organic Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.3762/bjoc.20.185\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, ORGANIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Beilstein Journal of Organic Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.3762/bjoc.20.185","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}

引用次数: 0

摘要

主动学习允许算法引导迭代实验，以加速分子优化并降低风险，但在项目早期阶段，主动训练的模型仍可能表现出较差的性能，因为在早期阶段，训练数据有限，模型利用可能会导致支架多样性有限的类似物鉴定。在此，我们介绍一种自适应方法 ActiveDelta，它利用成对的分子表征来预测当前最佳训练化合物的改进情况，从而优先考虑进一步的数据采集。我们将 ActiveDelta 概念应用于基于图的深度模型（Chemprop）和基于树的模型（XGBoost），并在 99 个 Ki 基准数据集的探索性主动学习过程中加以应用。我们发现，与 Chemprop、XGBoost 和随机森林的标准探索式主动学习实现相比，ActiveDelta 实现在识别更有效的抑制剂方面表现出色。ActiveDelta 方法还能根据 Murcko 支架识别出化学性质更多样化的抑制剂。最后，在通过 ActiveDelta 方法选取的数据上训练的 Chemprop 等深度模型可以在通过模拟时间分割创建的测试数据中更准确地识别抑制剂。总之，这项研究强调了分子配对方法的巨大潜力，它可以更快、更准确地识别出针对关键药物靶点的更多样化的分子靶点，从而进一步改进低数据量环境中流行的主动学习策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Finding the most potent compounds using active learning on molecular pairs.

Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 K_i benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Beilstein Journal of Organic Chemistry 化学-有机化学

CiteScore

4.90

自引率

3.70%

发文量

167

审稿时长

1.4 months

期刊介绍： The Beilstein Journal of Organic Chemistry is an international, peer-reviewed, Open Access journal. It provides a unique platform for rapid publication without any charges (free for author and reader) – Platinum Open Access. The content is freely accessible 365 days a year to any user worldwide. Articles are available online immediately upon publication and are publicly archived in all major repositories. In addition, it provides a platform for publishing thematic issues (theme-based collections of articles) on topical issues in organic chemistry. The journal publishes high quality research and reviews in all areas of organic chemistry, including organic synthesis, organic reactions, natural product chemistry, structural investigations, supramolecular chemistry and chemical biology.