Harnessing quality-throughput trade-off in scoring functions for extreme-scale virtual screening campaigns

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Yuedong Zhang, Gianmarco Accordi, Davide Gadioli, Gianluca Palermo
{"title":"Harnessing quality-throughput trade-off in scoring functions for extreme-scale virtual screening campaigns","authors":"Yuedong Zhang,&nbsp;Gianmarco Accordi,&nbsp;Davide Gadioli,&nbsp;Gianluca Palermo","doi":"10.1016/j.future.2025.107863","DOIUrl":null,"url":null,"abstract":"<div><div>Drug discovery is a long and costly process aimed at finding a molecule that yields a therapeutic effect. Virtual screening is one of the initial in-silico steps that aims at estimating how promising a molecule is. This stage needs to solve two well-known domain problems: molecular docking and scoring. While the accuracy of scoring functions is extensively investigated in comparisons, the execution time of their implementation is usually not considered. In virtual screening campaigns, the definition of a fixed time budget for the entire process and the average time required to process each molecule determines the upper limit of the number of molecules that can be evaluated. By reducing the time needed to evaluate a single molecule, we can screen a larger number of molecules, thereby increasing the possibility of finding a promising solution. For extreme-scale virtual screening campaigns, the computational budget is a critical aspect since even utilizing large-scale facilities would make it impractical to complete the screening within a feasible time unless the computational time for a single molecule is significantly reduced.</div><div>In this paper, we explore optimization and approximation techniques applied to two well-known scoring functions, which we modify to investigate different accuracy-performance trade-offs to support large-scale virtual screening campaigns. Despite the different approaches we considered, experimental results demonstrate that the proposed enhancements achieve better enrichment factors in virtual screening scenarios. Moreover, we port both implementations to CUDA to show that the proposed techniques are GPU-friendly and aligned with modern supercomputing infrastructures.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107863"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X2500158X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Drug discovery is a long and costly process aimed at finding a molecule that yields a therapeutic effect. Virtual screening is one of the initial in-silico steps that aims at estimating how promising a molecule is. This stage needs to solve two well-known domain problems: molecular docking and scoring. While the accuracy of scoring functions is extensively investigated in comparisons, the execution time of their implementation is usually not considered. In virtual screening campaigns, the definition of a fixed time budget for the entire process and the average time required to process each molecule determines the upper limit of the number of molecules that can be evaluated. By reducing the time needed to evaluate a single molecule, we can screen a larger number of molecules, thereby increasing the possibility of finding a promising solution. For extreme-scale virtual screening campaigns, the computational budget is a critical aspect since even utilizing large-scale facilities would make it impractical to complete the screening within a feasible time unless the computational time for a single molecule is significantly reduced.
In this paper, we explore optimization and approximation techniques applied to two well-known scoring functions, which we modify to investigate different accuracy-performance trade-offs to support large-scale virtual screening campaigns. Despite the different approaches we considered, experimental results demonstrate that the proposed enhancements achieve better enrichment factors in virtual screening scenarios. Moreover, we port both implementations to CUDA to show that the proposed techniques are GPU-friendly and aligned with modern supercomputing infrastructures.

Abstract Image

在评分功能中利用质量-吞吐量权衡,用于极端规模的虚拟筛选活动
药物发现是一个漫长而昂贵的过程,其目的是找到一种产生治疗效果的分子。虚拟筛选是最初的计算机步骤之一,旨在评估一个分子的前景。这一阶段需要解决两个众所周知的领域问题:分子对接和评分。虽然在比较中广泛研究了评分函数的准确性,但通常不会考虑其实现的执行时间。在虚拟筛选活动中,整个过程的固定时间预算和处理每个分子所需的平均时间的定义决定了可以评估的分子数量的上限。通过减少评估单个分子所需的时间,我们可以筛选更多的分子,从而增加找到有希望的解决方案的可能性。对于极端规模的虚拟筛选活动,计算预算是一个关键方面,因为即使使用大规模的设施,也会使在可行的时间内完成筛选变得不切实际,除非单个分子的计算时间显着减少。在本文中,我们探索了应用于两个知名评分函数的优化和近似技术,并对其进行了修改,以研究不同的准确性和性能权衡,以支持大规模的虚拟筛选活动。尽管我们考虑了不同的方法,但实验结果表明,所提出的增强方法在虚拟筛选场景中获得了更好的富集因子。此外,我们将两种实现移植到CUDA,以表明所提出的技术是gpu友好的,并与现代超级计算基础设施保持一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信