Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery

IF 12.7 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Davide Boldini, Lukas Friedrich, Daniel Kuhn and Stephan A. Sieber*, 
{"title":"Machine Learning Assisted Hit Prioritization for High Throughput Screening in Drug Discovery","authors":"Davide Boldini,&nbsp;Lukas Friedrich,&nbsp;Daniel Kuhn and Stephan A. Sieber*,&nbsp;","doi":"10.1021/acscentsci.3c01517","DOIUrl":null,"url":null,"abstract":"<p >Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.</p><p >Minimum variance sampling analysis (MVS-A) is a fast machine-learning approach enabling the identification of both true bioactive compounds and false positives in high throughput screening data.</p>","PeriodicalId":10,"journal":{"name":"ACS Central Science","volume":null,"pages":null},"PeriodicalIF":12.7000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acscentsci.3c01517","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Central Science","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acscentsci.3c01517","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.

Minimum variance sampling analysis (MVS-A) is a fast machine-learning approach enabling the identification of both true bioactive compounds and false positives in high throughput screening data.

Abstract Image

Abstract Image

机器学习辅助药物发现中高通量筛选的命中优先级排序
从高通量筛选活动中有效地确定生物活性化合物的优先次序是加速药物开发工作的一项基本挑战。在本研究中,我们提出了第一种数据驱动方法,可同时检测检测干扰物和优先筛选真正的生物活性化合物。通过分析梯度提升模型在嘈杂的高通量筛选数据上训练过程中的学习动态,并使用一种新颖的样本影响公式,我们能够区分出表现出预期生物反应的化合物和产生检测伪影的化合物。因此,我们的方法可以实现假阳性和真阳性检测,而无需依赖先前的筛选或检测干扰机制,因此适用于任何高通量筛选活动。我们证明,与所有测试基线相比,我们的方法能一致地排除不同机制的检测干扰,并更有效地确定生物相关化合物的优先级,包括一项模拟在真实药物发现活动中使用该方法的回顾性案例研究。最后,我们的工具具有极高的计算效率,在低资源硬件上每次检测只需不到 30 秒。因此,我们的研究结果表明,我们的方法是现有假阳性检测工具的理想补充,可用于指导高通量筛选活动后的进一步药理优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Central Science
ACS Central Science Chemical Engineering-General Chemical Engineering
CiteScore
25.50
自引率
0.50%
发文量
194
审稿时长
10 weeks
期刊介绍: ACS Central Science publishes significant primary reports on research in chemistry and allied fields where chemical approaches are pivotal. As the first fully open-access journal by the American Chemical Society, it covers compelling and important contributions to the broad chemistry and scientific community. "Central science," a term popularized nearly 40 years ago, emphasizes chemistry's central role in connecting physical and life sciences, and fundamental sciences with applied disciplines like medicine and engineering. The journal focuses on exceptional quality articles, addressing advances in fundamental chemistry and interdisciplinary research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信