{"title":"Finding the most potent compounds using active learning on molecular pairs.","authors":"Zachary Fralish, Daniel Reker","doi":"10.3762/bjoc.20.185","DOIUrl":null,"url":null,"abstract":"<p><p>Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 K<sub>i</sub> benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.</p>","PeriodicalId":8756,"journal":{"name":"Beilstein Journal of Organic Chemistry","volume":"20 ","pages":"2152-2162"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368049/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Beilstein Journal of Organic Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.3762/bjoc.20.185","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CHEMISTRY, ORGANIC","Score":null,"Total":0}
引用次数: 0
Abstract
Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.
期刊介绍:
The Beilstein Journal of Organic Chemistry is an international, peer-reviewed, Open Access journal. It provides a unique platform for rapid publication without any charges (free for author and reader) – Platinum Open Access. The content is freely accessible 365 days a year to any user worldwide. Articles are available online immediately upon publication and are publicly archived in all major repositories. In addition, it provides a platform for publishing thematic issues (theme-based collections of articles) on topical issues in organic chemistry.
The journal publishes high quality research and reviews in all areas of organic chemistry, including organic synthesis, organic reactions, natural product chemistry, structural investigations, supramolecular chemistry and chemical biology.