Gergana Tancheva, Vesa Hongisto, Konrad Patyra, Luchesar Iliev, Nikolay Kochev, Penny Nymark, Pekka Kohonen, Nina Jeliazkova, Roland Grafström
{"title":"High-throughput screening data generation, scoring and FAIRification: a case study on nanomaterials","authors":"Gergana Tancheva, Vesa Hongisto, Konrad Patyra, Luchesar Iliev, Nikolay Kochev, Penny Nymark, Pekka Kohonen, Nina Jeliazkova, Roland Grafström","doi":"10.1186/s13321-025-01001-8","DOIUrl":"10.1186/s13321-025-01001-8","url":null,"abstract":"<div><p>In vitro-based high-throughput screening (HTS) technology is applicable to hazard-based ranking and grouping of diverse agents, including nanomaterials (NMs). We present a standardized HTS-derived human cell-based testing protocol which combines the analysis of five assays into a broad toxic mode-of-action-based hazard value, termed Tox5-score. The overall protocol includes automated data FAIRification, preprocessing and score calculation. A newly developed Python module ToxFAIRy can be used independently or within an Orange Data Mining workflow that has custom widgets for fine-tuning, included in the custom-developed Orange add-on Orange3-ToxFAIRy. The created data-handling workflow has the advantage of facilitated conversion of the FAIR HTS data into the NeXus format, capable of integrating all data and metadata into a single file and multidimensional matrix amenable to interactive visualizations and selection of data subsets. The resulting FAIR HTS data includes both raw and interpreted data (scores) in machine-readable formats distributable as data archive, including into the eNanoMapper database and Nanosafety Data Interface. We overall present a HTS-driven FAIRifed computational assessment tool for hazard analysis of multiple agents simultaneously, including with broad potential applicability across diverse scientific communities.</p><p><b>Scientific Contribution</b> Our study represents significant tool development for analyzing multiple materials hazards rapidly and simultaneously, aligning with regulatory recommendations and addressing industry needs. The innovative integration of in vitro-based toxicity scoring with automated data preprocessing within FAIRification workflows enhances the applicability of HTS-derived data application in the materials development community. The protocols described increase the effectiveness of materials toxicity testing and mode-of-action research by offering an alternative to manual data processing, enrichment of HTS data with metadata, refining testing methodologies—such as for bioactivity-based grouping—and overall, demonstrates the value of reusing existing data.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01001-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GESim: ultrafast graph-based molecular similarity calculation via von Neumann graph entropy","authors":"Hiroaki Shiokawa, Shoichi Ishida, Kei Terayama","doi":"10.1186/s13321-025-01003-6","DOIUrl":"10.1186/s13321-025-01003-6","url":null,"abstract":"<div><p>Representing molecules as graphs is a natural approach for capturing their structural information, with atoms depicted as nodes and bonds as edges. Although graph-based similarity calculation approaches, such as the graph edit distance, have been proposed for calculating molecular similarity, these approaches are nondeterministic polynomial (NP)-hard and thus computationally infeasible for routine use, unlike fingerprint-based methods. To address this limitation, we developed GESim, an ultrafast graph-based method for calculating molecular similarity on the basis of von Neumann graph entropy. GESim enables molecular similarity calculations by considering entire molecular graphs, and evaluations using two benchmarks for molecular similarity suggest that GESim has the ability to differentiate between highly similar molecules, even in cases where other methods fail to effectively distinguish their similarity. GESim is provided as an open-source package on GitHub at https://github.com/LazyShion/GESim.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01003-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nadin Ulrich, Karsten Voigt, Anton Kudria, Alexander Böhme, Ralf-Uwe Ebert
{"title":"Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset","authors":"Nadin Ulrich, Karsten Voigt, Anton Kudria, Alexander Böhme, Ralf-Uwe Ebert","doi":"10.1186/s13321-025-01000-9","DOIUrl":"10.1186/s13321-025-01000-9","url":null,"abstract":"<div><p>Water solubility is a relevant physico-chemcial property in environmental chemistry, toxicology, and drug design. Although the water solubility is besides the octanol–water partition coefficient, melting point, and boiling point a property with a large amount of available experimental data, there are still more compounds in the chemical universe for which information on their water solubility is lacking. Thus, prediction tools with a broad application domain are needed to fill the corresponding data gaps. To this end, we developed a graph convolutional neural network model (GNN) to predict the water solubility in the form of log <i>S</i><sub>w</sub> based on a highly curated dataset of 9800 chemicals. We started our model development with a curation workflow of the AqSolDB data, ending with 7605 data points. We added 2195 chemicals with experimental data, which we found in the literature, to our dataset. In the final dataset, log <i>S</i><sub>w</sub> values range from − 13.17 to 0.50. Higher values were excluded by a cut-off introduced to eliminate fully miscible chemicals. We developed a consensus GNN by a fivefold split of the corresponding training set (70% of the data) and validation set (20%) and used 10% as independent test set for the evaluation of the performance of the different splits and the consensus model. By doing so, we achieved an <i>r</i><sup>2</sup> of 0.901, a <i>q</i><sup>2</sup> of 0.896, and an <i>rmse</i> of 0.657 on our independently selected test set, which is close to the experimental error of 0.5 to 0.6 log units. We further provide the information on the application domain and compare our performance to other existing prediction tools.</p><p><b>Scientific contribution</b> Based on a highly curated dataset, we developed a neural network to predict the water solubility of chemicals for a broad application domain. Data curation was done by us in a step-wise procedure, where we identified various errors in the experimental data. Based on an independent test set, we compare our prediction results to those of the available prediction models.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01000-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning motif features and topological structure of molecules for metabolic pathway prediction","authors":"Jianguo Hu, Yiqing Zhang, Jinxin Xie, Zhen Yuan, Zhangxiang Yin, Shanshan Shi, Honglin Li, Shiliang Li","doi":"10.1186/s13321-025-00994-6","DOIUrl":"10.1186/s13321-025-00994-6","url":null,"abstract":"<div><p>Metabolites serve as crucial biomarkers for assessing disease progression and understanding underlying pathogenic mechanisms. However, when the metabolic pathway category of metabolites is unknown, researchers face challenges in conducting metabolomic analyses. Due to the complexity of wet laboratory experimentation for pathway identification, there is a growing demand for predictive methods. Various computational approaches, including machine learning and graph neural networks, have been proposed; however, interpretability remains a challenge. We have developed a neural network framework called MotifMol3D, which is designed for predicting molecular metabolic pathway categories. This framework introduces motif information to mine local features of small-sample molecules, combining with graph neural network and 3D information to complete the prediction task. Using a dataset of 5,698 molecules that participate in 11 metabolic pathway categories in the KEGG database, MotifMol3D outperformed state-of-the-art methods in precision, recall, and F1 score. In addition, ablation study and motif analysis have demonstrated the effectiveness and usefulness of the model. Motif analysis, in particular, has shown motif information can actually characterize the main features of specific pathway molecules to a certain extent and enhance the interpretability of the model. An external validation further corroborates this observation. MotifMol3D is an open-source tool that is available at https://github.com/Irena-Zhang/MotifMol3D.git.</p><p><b>Scientific contribution</b> MotifMol3D integrates motif information, graph neural networks, and 3D structural data to enhance feature extraction for small-sample molecules, improving the precision and interpretability of metabolic pathway predictions. The model outperforms state-of-the-art approaches in precision, recall, and F1 score. This work reveals how motif information characterizes pathway-specific molecules, offering novel insights into molecular properties within metabolic pathways.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00994-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Activity cliff-aware reinforcement learning for de novo drug design","authors":"Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang","doi":"10.1186/s13321-025-01006-3","DOIUrl":"10.1186/s13321-025-01006-3","url":null,"abstract":"<div><p>The integration of artificial intelligence (AI) in drug discovery offers promising opportunities to streamline and enhance the traditional drug development process. One core challenge in <i>de novo</i> molecular design is modeling complex structure-activity relationships (SAR), such as activity cliffs, where minor molecular changes yield significant shifts in biological activity. In response to the limitations of current models in capturing these critical discontinuities, we propose the Activity Cliff-Aware Reinforcement Learning (ACARL) framework. ACARL leverages a novel activity cliff index to identify and amplify activity cliff compounds, uniquely incorporating them into the reinforcement learning (RL) process through a tailored contrastive loss. This RL framework is designed to focus model optimization on high-impact regions within the SAR landscape, improving the generation of molecules with targeted properties. Experimental evaluations across multiple protein targets demonstrate ACARL’s superior performance in generating high-affinity molecules compared to existing state-of-the-art algorithms. These findings indicate that ACARL effectively integrates SAR principles into the RL-based drug design pipeline, offering a robust approach for <i>de novo</i> molecular design</p><p><b>Scientific contribution</b> Our work introduces a machine learning-based drug design framework that explicitly models activity cliffs, a first in AI-driven molecular design. ACARL’s primary technical contributions include the formulation of an activity cliff index to detect these critical points, and a contrastive RL loss function that dynamically enhances the generation of activity cliff compounds, optimizing the model for high-impact SAR regions. This approach demonstrates the efficacy of combining domain knowledge with machine learning advances, significantly expanding the scope and reliability of AI in drug discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01006-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The pucke.rs toolkit to facilitate sampling the conformational space of biomolecular monomers","authors":"Jérôme Rihon, Sten Reynders, Vitor Bernardes Pinheiro, Eveline Lescrinier","doi":"10.1186/s13321-025-00977-7","DOIUrl":"10.1186/s13321-025-00977-7","url":null,"abstract":"<div><p>Understanding of the structural and dynamic behaviour of molecules is a major objective in molecular modeling research. Sampling through the torsional space is an efficient way to map their behaviour. However, generating a landscape of possible conformations relies on multiple formalisms whose mathematics are often difficult to convert to code. Here we present a command line tool and a scripting module to provide the means to generate such landscapes with different axes according to various formalisms exploited for conformational sampling. Additionally to this toolkit, we apply a benchmarking study on subjecting a DNA nucleoside to a diverse set of quantum mechanical levels of theory for geometry optimisations and energy potential calculations. The potential of the tool is demonstrated on examples including amino acids and synthetic nucleosides having five-membered or six-membered sugar moieties.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00977-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143841665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Zavadskaya, Anastasia Orlova, Andrei Dmitrenko, Vladimir Vinogradov
{"title":"Integrating QSAR modelling with reinforcement learning for Syk inhibitor discovery","authors":"Maria Zavadskaya, Anastasia Orlova, Andrei Dmitrenko, Vladimir Vinogradov","doi":"10.1186/s13321-025-00998-2","DOIUrl":"10.1186/s13321-025-00998-2","url":null,"abstract":"<div><p>Spleen tyrosine kinase (Syk) is a crucial mediator of inflammatory processes and a promising therapeutic target for the management of autoimmune disorders, such as immune thrombocytopenia. While several Syk inhibitors are known to date, their efficacy and safety profiles remain suboptimal, necessitating the exploration of novel compounds. The study introduces a novel deep reinforcement learning strategy for drug discovery, specifically designed to identify new Syk inhibitors. The approach integrates quantitative structure–activity relationship (QSAR) predictions with generative modelling, employing a stacking-ensemble model that achieves a correlation coefficient of 0.78. From over 78,000 molecules generated by this methodology, we identified 139 promising candidates with high predicted potency, binding affinity and optimal drug-likeness properties, demonstrating structural novelty while maintaining essential Syk inhibitor characteristics. Our approach establishes a versatile framework for accelerated drug discovery, which is particularly valuable for the development of rare disease therapeutics.</p><p><b>Scientific contribution</b></p><p>The study presents the first application of QSAR-guided reinforcement learning for Syk inhibitor discovery, yielding structurally novel candidates with predicted high potency. The presented methodology can be adapted for other therapeutic targets, potentially accelerating the drug development process.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00998-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143830813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungchan An, Yeonjin Lee, Junpyo Gong, Seokyoung Hwang, In Guk Park, Jayhyun Cho, Min Ju Lee, Minkyu Kim, Yun Pyo Kang, Minsoo Noh
{"title":"InertDB as a generative AI-expanded resource of biologically inactive small molecules from PubChem","authors":"Seungchan An, Yeonjin Lee, Junpyo Gong, Seokyoung Hwang, In Guk Park, Jayhyun Cho, Min Ju Lee, Minkyu Kim, Yun Pyo Kang, Minsoo Noh","doi":"10.1186/s13321-025-00999-1","DOIUrl":"10.1186/s13321-025-00999-1","url":null,"abstract":"<div><p>The development of robust artificial intelligence (AI)-driven predictive models relies on high-quality, diverse chemical datasets. However, the scarcity of negative data and a publication bias toward positive results often hinder accurate biological activity prediction. To address this challenge, we introduce InertDB, a comprehensive database comprising 3,205 curated inactive compounds (CICs) identified through rigorous review of over 4.6 million compound records in PubChem. CIC selection prioritized bioassay diversity, determined using natural language processing (NLP)-based clustering metrics, while ensuring minimal biological activity across all evaluated bioassays. Notably, 97.2% of CICs adhere to the Rule of Five, a proportion significantly higher than that of overall PubChem dataset. To further expand the chemical space, InertDB also features 64,368 generated inactive compounds (GICs) produced using a deep generative AI model trained on the CIC dataset. Compared to conventional approaches such as random sampling or property-matched decoys, InertDB significantly improves predictive AI performance, particularly for phenotypic activity prediction by providing reliable inactive compound sets.</p><p><b>Scientific contributions</b></p><p>InertDB addresses a critical gap in AI-driven drug discovery by providing a comprehensive repository of biologically inactive compounds, effectively resolving the scarcity of negative data that limits prediction accuracy and model reliability. By leveraging language model-based bioassay diversity metrics and generative AI, InertDB integrates rigorously curated inactive compounds with an expanded chemical space. InertDB serves as a valuable alternative to random sampling and decoy generation, offering improved training datasets and enhancing the accuracy of phenotypic pharmacological activity prediction.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00999-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youngchun Kwon, Hyunjeong Jeon, Joonhyuk Choi, Youn-Suk Choi, Seokho Kang
{"title":"Enhancing chemical reaction search through contrastive representation learning and human-in-the-loop","authors":"Youngchun Kwon, Hyunjeong Jeon, Joonhyuk Choi, Youn-Suk Choi, Seokho Kang","doi":"10.1186/s13321-025-00987-5","DOIUrl":"10.1186/s13321-025-00987-5","url":null,"abstract":"<div><p>In synthesis planning, identifying and optimizing chemical reactions are important for the successful design of synthetic pathways to target substances. Chemical reaction databases assist chemists in gaining insights into this process. Traditionally, searching for relevant records from a reaction database has relied on the manual formulation of queries by chemists based on their search purposes, which is challenging without explicit knowledge of what they are searching for. In this study, we propose an intelligent chemical reaction search system that simplifies the process of enhancing the search results. When a user submits a query, a list of relevant records is retrieved from the reaction database. Users can express their preferences and requirements by providing binary ratings for the individual retrieved records. The search results are refined based on the user feedback. To implement this system effectively, we incorporate and adapt contrastive representation learning, dimensionality reduction, and human-in-the-loop techniques. Contrastive learning is used to train a representation model that embeds records in the reaction database as numerical vectors suitable for chemical reaction searches. Dimensionality reduction is applied to compress these vectors, thereby enhancing the search efficiency. Human-in-the-loop is integrated to iteratively update the representation model by reflecting user feedback. Through experimental investigations, we demonstrate that the proposed method effectively improves the chemical reaction search towards better alignment with user preferences and requirements. </p><p><b>Scientific contribution</b> This study seeks to enhance the search functionality of chemical reaction databases by drawing inspiration from recommender systems. The proposed method simplifies the search process, offering an alternative to the complexity of formulating explicit query rules. We believe that the proposed method can assist users in efficiently discovering records relevant to target reactions, especially when they encounter difficulties in crafting detailed queries due to limited knowledge.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00987-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unveiling polyphenol-protein interactions: a comprehensive computational analysis","authors":"Samo Lešnik, Marko Jukić, Urban Bren","doi":"10.1186/s13321-025-00997-3","DOIUrl":"10.1186/s13321-025-00997-3","url":null,"abstract":"<div><p>Our study investigates polyphenol-protein interactions, analyzing their structural diversity and dynamic behavior. Analysis of the entire Protein Data Bank reveals diverse polyphenolic structures, engaging in various noncovalent interactions with proteins. Interactions observed across crystal structures among diverse polyphenolic classes reveal similarities, underscoring consistent patterns across a spectrum of structural motifs. On the other hand, molecular dynamics (MD) simulations of polyphenol-protein complexes unveil dynamic binding patterns, highlighting the influx of water molecules into the binding site and underscoring limitations of static crystal structures. Water-mediated interactions emerge as crucial in polyphenol-protein binding, leading to variable binding patterns observed in MD simulations. Comparison of high- and low-resolution crystal structures as starting points for MD simulations demonstrates their robustness, exhibiting consistent dynamics regardless of the quality of the initial structural data. Additionally, the impact of glycosylation on polyphenol binding is explored, revealing its role in modulating interactions with proteins. In contrast to synthetic drugs, polyphenol binding seems to exhibit heightened flexibility, driven by dynamic water-mediated interactions, which may also facilitate their promiscuous binding. Comprehensive dynamic studies are, therefore essential to understand polyphenol-protein recognition mechanisms. Overall, our study provides novel insights into polyphenol-protein interactions, informing future research for harnessing polyphenolic therapeutic potential through rational drug design.</p><p><b>Scientific contribution</b>: In this study, we present an analysis of (natural) polyphenol-protein binding conformations, leveraging the entirety of the Protein Data Bank structural data on polyphenols, while extending the binding conformation sampling through molecular dynamics simulations. For the first time, we introduce experimentally supported large-scale systematization of polyphenol binding patterns. Moreover, our insight into the significance of explicit water molecules and hydrogen-bond bridging rationalizes the polyphenol promiscuity paradigm, advocating for a deeper understanding of polyphenol recognition mechanisms crucial for informed natural compound-based drug design.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00997-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}