{"title":"在评分功能中利用质量-吞吐量权衡,用于极端规模的虚拟筛选活动","authors":"Yuedong Zhang, Gianmarco Accordi, Davide Gadioli, Gianluca Palermo","doi":"10.1016/j.future.2025.107863","DOIUrl":null,"url":null,"abstract":"<div><div>Drug discovery is a long and costly process aimed at finding a molecule that yields a therapeutic effect. Virtual screening is one of the initial in-silico steps that aims at estimating how promising a molecule is. This stage needs to solve two well-known domain problems: molecular docking and scoring. While the accuracy of scoring functions is extensively investigated in comparisons, the execution time of their implementation is usually not considered. In virtual screening campaigns, the definition of a fixed time budget for the entire process and the average time required to process each molecule determines the upper limit of the number of molecules that can be evaluated. By reducing the time needed to evaluate a single molecule, we can screen a larger number of molecules, thereby increasing the possibility of finding a promising solution. For extreme-scale virtual screening campaigns, the computational budget is a critical aspect since even utilizing large-scale facilities would make it impractical to complete the screening within a feasible time unless the computational time for a single molecule is significantly reduced.</div><div>In this paper, we explore optimization and approximation techniques applied to two well-known scoring functions, which we modify to investigate different accuracy-performance trade-offs to support large-scale virtual screening campaigns. Despite the different approaches we considered, experimental results demonstrate that the proposed enhancements achieve better enrichment factors in virtual screening scenarios. Moreover, we port both implementations to CUDA to show that the proposed techniques are GPU-friendly and aligned with modern supercomputing infrastructures.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107863"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harnessing quality-throughput trade-off in scoring functions for extreme-scale virtual screening campaigns\",\"authors\":\"Yuedong Zhang, Gianmarco Accordi, Davide Gadioli, Gianluca Palermo\",\"doi\":\"10.1016/j.future.2025.107863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Drug discovery is a long and costly process aimed at finding a molecule that yields a therapeutic effect. Virtual screening is one of the initial in-silico steps that aims at estimating how promising a molecule is. This stage needs to solve two well-known domain problems: molecular docking and scoring. While the accuracy of scoring functions is extensively investigated in comparisons, the execution time of their implementation is usually not considered. In virtual screening campaigns, the definition of a fixed time budget for the entire process and the average time required to process each molecule determines the upper limit of the number of molecules that can be evaluated. By reducing the time needed to evaluate a single molecule, we can screen a larger number of molecules, thereby increasing the possibility of finding a promising solution. For extreme-scale virtual screening campaigns, the computational budget is a critical aspect since even utilizing large-scale facilities would make it impractical to complete the screening within a feasible time unless the computational time for a single molecule is significantly reduced.</div><div>In this paper, we explore optimization and approximation techniques applied to two well-known scoring functions, which we modify to investigate different accuracy-performance trade-offs to support large-scale virtual screening campaigns. Despite the different approaches we considered, experimental results demonstrate that the proposed enhancements achieve better enrichment factors in virtual screening scenarios. Moreover, we port both implementations to CUDA to show that the proposed techniques are GPU-friendly and aligned with modern supercomputing infrastructures.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"172 \",\"pages\":\"Article 107863\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X2500158X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X2500158X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Harnessing quality-throughput trade-off in scoring functions for extreme-scale virtual screening campaigns
Drug discovery is a long and costly process aimed at finding a molecule that yields a therapeutic effect. Virtual screening is one of the initial in-silico steps that aims at estimating how promising a molecule is. This stage needs to solve two well-known domain problems: molecular docking and scoring. While the accuracy of scoring functions is extensively investigated in comparisons, the execution time of their implementation is usually not considered. In virtual screening campaigns, the definition of a fixed time budget for the entire process and the average time required to process each molecule determines the upper limit of the number of molecules that can be evaluated. By reducing the time needed to evaluate a single molecule, we can screen a larger number of molecules, thereby increasing the possibility of finding a promising solution. For extreme-scale virtual screening campaigns, the computational budget is a critical aspect since even utilizing large-scale facilities would make it impractical to complete the screening within a feasible time unless the computational time for a single molecule is significantly reduced.
In this paper, we explore optimization and approximation techniques applied to two well-known scoring functions, which we modify to investigate different accuracy-performance trade-offs to support large-scale virtual screening campaigns. Despite the different approaches we considered, experimental results demonstrate that the proposed enhancements achieve better enrichment factors in virtual screening scenarios. Moreover, we port both implementations to CUDA to show that the proposed techniques are GPU-friendly and aligned with modern supercomputing infrastructures.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.