Ultrahigh-Throughput Virtual Screening Strategies against PPI Targets: A Case Study of STAT Inhibitors

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-07-04 DOI:10.1021/acs.jcim.5c00907

Tibor Viktor Szalai, Nikolett Péczka, Levente Sipos-Szabó, László Petri, Dávid Bajusz* and György M. Keserű*,

{"title":"Ultrahigh-Throughput Virtual Screening Strategies against PPI Targets: A Case Study of STAT Inhibitors","authors":"Tibor Viktor Szalai, Nikolett Péczka, Levente Sipos-Szabó, László Petri, Dávid Bajusz* and György M. Keserű*, ","doi":"10.1021/acs.jcim.5c00907","DOIUrl":null,"url":null,"abstract":"<p >In recent years, virtual screening of ultralarge (10<sup>8+</sup>) libraries of synthetically accessible compounds (uHTVS) became a popular approach in hit identification. With AI-assisted virtual screening workflows, such as Deep Docking, these protocols might be feasible even without supercomputers. Yet, these methodologies have their own conceptual limitations, including the fact that physics-based docking is replaced by a cheaper deep learning (DL) step for the vast majority of compounds. In turn, the performance of this DL step will highly depend on the performance of the underlying docking model that is used to evaluate parts of the whole data set to train the DL architecture itself. Here, we evaluated the performance of the popular Deep Docking workflow on compound libraries of different sizes, against benchmark cases of classic brute-force docking approaches conducted on smaller libraries. We were especially interested in more difficult, protein–protein interaction-type oncotargets where the reliability of the underlying docking model is harder to assess. Specifically, our virtual screens have resulted in several new inhibitors of two oncogenic transcription factors, STAT3 and STAT5b. For STAT5b, in particular, we disclose the first application of virtual screening against its N-terminal domain, whose importance was recognized more recently. While the AI-based uHTVS is computationally more demanding, it can achieve exceptionally good hit rates (50.0% for STAT3). Deep Docking can also work well with a compound library containing only several million (instead of several billion) compounds, achieving a 42.9% hit rate against the SH2 domain of STAT5b, while presenting a highly economic workflow with just under 120,000 compounds actually docked.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 14","pages":"7734–7748"},"PeriodicalIF":5.3000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308805/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.5c00907","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, virtual screening of ultralarge (10⁸⁺) libraries of synthetically accessible compounds (uHTVS) became a popular approach in hit identification. With AI-assisted virtual screening workflows, such as Deep Docking, these protocols might be feasible even without supercomputers. Yet, these methodologies have their own conceptual limitations, including the fact that physics-based docking is replaced by a cheaper deep learning (DL) step for the vast majority of compounds. In turn, the performance of this DL step will highly depend on the performance of the underlying docking model that is used to evaluate parts of the whole data set to train the DL architecture itself. Here, we evaluated the performance of the popular Deep Docking workflow on compound libraries of different sizes, against benchmark cases of classic brute-force docking approaches conducted on smaller libraries. We were especially interested in more difficult, protein–protein interaction-type oncotargets where the reliability of the underlying docking model is harder to assess. Specifically, our virtual screens have resulted in several new inhibitors of two oncogenic transcription factors, STAT3 and STAT5b. For STAT5b, in particular, we disclose the first application of virtual screening against its N-terminal domain, whose importance was recognized more recently. While the AI-based uHTVS is computationally more demanding, it can achieve exceptionally good hit rates (50.0% for STAT3). Deep Docking can also work well with a compound library containing only several million (instead of several billion) compounds, achieving a 42.9% hit rate against the SH2 domain of STAT5b, while presenting a highly economic workflow with just under 120,000 compounds actually docked.

查看原文本刊更多论文

针对PPI靶点的超高通量虚拟筛选策略：STAT抑制剂的案例研究。

近年来，对合成可及化合物（uHTVS）的超大（108+）文库进行虚拟筛选成为一种流行的命中识别方法。有了人工智能辅助的虚拟筛选工作流程，比如深度对接，即使没有超级计算机，这些协议也可能是可行的。然而，这些方法有其概念上的局限性，包括对于绝大多数化合物，基于物理的对接被更便宜的深度学习（DL）步骤所取代。反过来，这个深度学习步骤的性能将高度依赖于底层对接模型的性能，该模型用于评估整个数据集的部分，以训练深度学习架构本身。在这里，我们评估了流行的深度对接工作流在不同大小的复合库上的性能，对比了在较小的库上进行的经典暴力对接方法的基准案例。我们对更困难的蛋白质-蛋白质相互作用类型的肿瘤共靶点特别感兴趣，其中底层对接模型的可靠性更难评估。具体来说，我们的虚拟筛选已经产生了两种致癌转录因子STAT3和STAT5b的几种新的抑制剂。特别是对于STAT5b，我们披露了针对其n端结构域的虚拟筛选的首次应用，其重要性最近才被认识到。虽然基于人工智能的uHTVS对计算的要求更高，但它可以实现非常好的命中率（STAT3为50.0%）。深度对接也可以很好地处理仅包含数百万（而不是数十亿）化合物的化合物库，对STAT5b的SH2结构域的命中率达到42.9%，同时提供了一个高度经济的工作流程，实际对接的化合物不到12万个。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.