Journal of Cheminformatics最新文献

筛选
英文 中文
Prediction of intrinsic solubility for drug-like organic compounds using automated network optimizer (ANO) for physicochemical feature and hyperparameter optimization. 基于物理化学特征和超参数优化的自动网络优化器(ANO)预测类药物有机化合物的固有溶解度。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-09 DOI: 10.1186/s13321-026-01220-7
You Kyoung Chung, Seung Jin Lee, Junho Lee, Himchanvit Cho, Sung-Jin Kim, Joonsuk Huh
{"title":"Prediction of intrinsic solubility for drug-like organic compounds using automated network optimizer (ANO) for physicochemical feature and hyperparameter optimization.","authors":"You Kyoung Chung, Seung Jin Lee, Junho Lee, Himchanvit Cho, Sung-Jin Kim, Joonsuk Huh","doi":"10.1186/s13321-026-01220-7","DOIUrl":"https://doi.org/10.1186/s13321-026-01220-7","url":null,"abstract":"<p><p>Accurate prediction of aqueous solubility remains a critical challenge in the chemical and pharmaceutical industries, significantly influencing drug development and delivery. This study revisits this well-explored area by leveraging the advanced capabilities of modern computational resources. We apply an automated network optimizer model that integrates dual optimization processes for molecular features and hyperparameters, streamlining the traditionally complex hyperparameter search while providing an efficient interpretation of molecular properties. By employing feature optimization techniques, our deep neural network model demonstrates improvements in both the speed and accuracy of molecular property predictions, achieving an average performance of R<sup>2</sup> = 0.991. This result outperforms conventional hyperparameter optimization methods such as grid search and random search in predicting the intrinsic solubility of 3,745 compounds across four external experimental datasets. Based on feature importance analysis, we identified key molecular features and structures that significantly influence solubility. Additionally, combining three molecular fingerprints (Morgan, MACCS key, and Avalon) with molecular descriptors enhances model performance, providing a deeper understanding of the relationship between molecular structure and solubility within the physicochemical feature optimization process. These findings underscore the potential of machine learning models to improve predictive modeling of physical properties, apply automated modeling and feature selection to new chemical datasets, and offer explainable insights into the principles driving solubility predictions.Scientific contributionsThis article focuses on recent advancements in the prediction of molecular properties through the application of a Quantitative Structure-Property Relationship (QSPR)-based deep neural network (DNN) model. This model employs a dual optimization approach that integrates both molecular feature selection and hyperparameter tuning. A review of publicly available datasets for drug-sized molecules is presented, highlighting the contributions of automated modeling and feature selection in enhancing the predictive accuracy of physical properties. Additionally, the article addresses the efficacy of these machine learning models in optimizing features, an essential consideration for practical applications in the field.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147863348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scaffold-based evaluation metrics for fair comparison of molecular generators. 基于支架的分子发生器公平比较评价指标。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-08 DOI: 10.1186/s13321-026-01213-6
Valeriia Fil, Remco L Van Den Broek, Martin Šícho, Ivan Čmelo, M Isabel Agea, Willem Jespers, Gerard J P Van Westen, Daniel Svozil
{"title":"Scaffold-based evaluation metrics for fair comparison of molecular generators.","authors":"Valeriia Fil, Remco L Van Den Broek, Martin Šícho, Ivan Čmelo, M Isabel Agea, Willem Jespers, Gerard J P Van Westen, Daniel Svozil","doi":"10.1186/s13321-026-01213-6","DOIUrl":"https://doi.org/10.1186/s13321-026-01213-6","url":null,"abstract":"<p><p>Molecular generators enable the exploration of chemical space to identify novel compounds with desirable properties. However, assessing their performance remains challenging due to the structural diversity and volume of the generated molecules. Commonly used evaluation metrics, focusing on chemical validity and novelty, do not fully align with the primary goal of molecular generation: the discovery of new biologically active compounds. To address this limitation, we introduce scaffold-based metrics that enable a fair comparison by evaluating a generator's ability to recover biologically relevant scaffolds absent from the input set. We applied the scaffold Recovery Score (RS), SEt scaffold Diversity (SED), and Absolute SEt scaffold Recall (ASER) metrics to compare several molecular generators, including Molpher, DrugEx, REINVENT, and Graph-based genetic algorithm. The proposed scaffold-based metrics provide a realistic framework for evaluating and optimizing molecular generators for their practical use in drug discovery scenarios, particularly in the design of focused virtual chemical libraries. The metrics are available as open-source in a GitHub repository at https://github.com/filvaleriia/scaffold-based-metrics.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147855574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A general-purpose framework for chemical reaction representation with atomic correspondence and flexible condition adaptation. 具有原子对应和灵活条件自适应的化学反应表示的通用框架。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-05 DOI: 10.1186/s13321-026-01201-w
Kaipeng Zeng, Xianbin Liu, Yu Zhang, Xiaokang Yang, Yaohui Jin, Yanyan Xu
{"title":"A general-purpose framework for chemical reaction representation with atomic correspondence and flexible condition adaptation.","authors":"Kaipeng Zeng, Xianbin Liu, Yu Zhang, Xiaokang Yang, Yaohui Jin, Yanyan Xu","doi":"10.1186/s13321-026-01201-w","DOIUrl":"https://doi.org/10.1186/s13321-026-01201-w","url":null,"abstract":"<p><strong>Motivation: </strong>Organic synthesis is fundamental to the chemical industry, particularly in domains such as pharmaceutical development. While artificial intelligence offers powerful tools for modeling chemical reactions, current approaches are primarily limited to two paradigms: those that rely on hand-crafted, domain-specific features, and those that apply generic deep learning models through simplistic concatenation or aggregation of reaction components. The former often struggles to scale effectively with increasing data volumes, while the latter relies on simple input- or feature-level concatenation to combine different reaction components. Such simplistic aggregation prevents these models from directly capturing the structural transformations between reactants and products, and also makes it difficult to adapt or extend them to datasets that include non-molecular reaction conditions without modifying their model architectures.</p><p><strong>Results: </strong>This paper introduces AlignReact, a novel chemical reaction representation learning framework designed for a wide range of organic reaction tasks. Our approach integrates atomic correspondence between reactants and products to discern precise molecular transformations, thereby improving the model's comprehension of molecular transformation patterns. We incorporate an adapter structure to embed reaction conditions into the representation, enhancing adaptability across varied datasets and tasks. Furthermore, a Reaction-Center-Aware attention mechanism is proposed to enable the model to focus on critical functional groups, yielding more powerful and informative representations. Evaluated across multiple downstream tasks, our model demonstrates superior performance, significantly outperforming existing chemical reaction representation learning architectures on most benchmark datasets.</p><p><strong>Scientific contribution: </strong>We introduce a chemical reaction representation learning framework that explicitly integrates atomic correspondence between reactants and products into the network architecture, enabling the model to perceive and model molecular structural transformations during reactions. As an extensible but preliminary feature, our approach also features a flexible, detachable module for integrating reaction conditions. Inspired by conditioning mechanisms from multimodal generation, this adapter module accepts precomputed features of varying dimensions and modalities, laying the groundwork for broader dataset compatibility compared to prior work, and enables fine-grained, context-aware conditioning. Extensive experiments on a range of downstream datasets demonstrate that our framework achieves state-of-the-art performance across several key chemical reaction prediction tasks.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147831951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Framework for evaluating explainable AI in antimicrobial drug discovery. 评估抗菌药物发现中可解释人工智能的框架。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-04 DOI: 10.1186/s13321-026-01200-x
Abdulmujeeb T Onawole, Mark A T Blaskovich, Johannes Zuegg
{"title":"Framework for evaluating explainable AI in antimicrobial drug discovery.","authors":"Abdulmujeeb T Onawole, Mark A T Blaskovich, Johannes Zuegg","doi":"10.1186/s13321-026-01200-x","DOIUrl":"https://doi.org/10.1186/s13321-026-01200-x","url":null,"abstract":"<p><p>Explainable artificial intelligence (XAI) methods for molecular property prediction lack standardized evaluation criteria, preventing widespread deployment in drug development and hit optimisation, where proper understanding of structure-activity relationship is essential. We developed an evaluation framework for XAI using fragment-based explainability tests to compare XAI with different molecular representation and challenge the different XAI approaches for proper explanation of activity cliffs. The evaluation methods include essential scaffold recognition, scaffold sensitivity and substructure specificity for explaining activity cliff, and technical evaluation on model robustness and consistency. Using a curated dataset of antibiotic molecules we established three XAI models with fundamentally different molecular representation: Random Forest on chemical features using SHAP, CNN on sequence-based SMILES using token occlusion, and RGCN on molecular graphs with substructure masking. Together with detailed case study, we evaluated the explainability behaviours and quality of the different XAI approaches and highlighted their limitations. While all XAI approaches displayed good predictive and scaffold recognition capabilities, and comparable robustness and consistency, they displayed quite different explainability behaviour for activity cliffs, revealing their different utility for medicinal chemistry. SCIENTIFIC CONTRIBUTION: A.T.O. performed the study, A.T.O and J.Z. contributed to the concept of the study and wrote the original manuscript. All authors reviewed the manuscript.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147831991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving protein-ligand complex generation with force field guidance 利用力场引导改进蛋白质-配体复合物的生成。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-02 Epub Date: 2026-05-03 DOI: 10.1186/s13321-026-01198-2
Helen Lai, Tingyu Wang, Hassan Sirelkhatim, Joe Eaton, Howard Huang, Brad Rees, Ola Engkvist, Jon Paul Janet, Xiaoyun Wang, Alessandro Tibo
{"title":"Improving protein-ligand complex generation with force field guidance","authors":"Helen Lai,&nbsp;Tingyu Wang,&nbsp;Hassan Sirelkhatim,&nbsp;Joe Eaton,&nbsp;Howard Huang,&nbsp;Brad Rees,&nbsp;Ola Engkvist,&nbsp;Jon Paul Janet,&nbsp;Xiaoyun Wang,&nbsp;Alessandro Tibo","doi":"10.1186/s13321-026-01198-2","DOIUrl":"10.1186/s13321-026-01198-2","url":null,"abstract":"<p>Generative models based on diffusion and flow matching have recently been applied to structure-based drug design, but their outputs often include unrealistic protein–ligand interactions that do not obey the laws of physics. We present an energy guidance framework that incorporates a molecular mechanics force field (MMFF94) directly into the sampling process. The method steers molecular generation toward more physically plausible and energetically stable conformations without retraining the underlying model. We evaluate this approach using two state-of-the-art architectures, SemlaFlow, a flow matching model and EDM, a diffusion model, on the PDBBind dataset. Across both models, energy guidance improves enthalpic interaction energy, improves strain energy by up to 75<span>(%)</span>, and generates over 1000 ligands with better docking scores than native ligands. These results demonstrate that lightweight, physics-based guidance can significantly enhance generative drug design while preserving chemical validity and diversity.</p><p>We introduce a novel, <i>training-free force field guidance</i> framework that steers ligand generation using empirical molecular mechanics (e.g., MMFF94) during diffusion or flow-based sampling–without modifying or retraining the base generative model (e.g., EDM or Semflaflow by [24]). Our method operates as a plug-in during inference time, leveraging energy feedback to generate poses with lower strain and having better predicted interactions with the protein structure.</p><p>Our main contributions are as follows:</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"18 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1186/s13321-026-01198-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147809073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Defining peptides in ChEBI 在ChEBI中定义肽。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-02 Epub Date: 2026-05-03 DOI: 10.1186/s13321-026-01196-4
Simon Flügel, Till Mossakowski, Fabian Neuhaus, Erik Pfanenstiel, Martin Glauer, Edgar Haak, Adnan Malik, Noel M. O’Boyle
{"title":"Defining peptides in ChEBI","authors":"Simon Flügel,&nbsp;Till Mossakowski,&nbsp;Fabian Neuhaus,&nbsp;Erik Pfanenstiel,&nbsp;Martin Glauer,&nbsp;Edgar Haak,&nbsp;Adnan Malik,&nbsp;Noel M. O’Boyle","doi":"10.1186/s13321-026-01196-4","DOIUrl":"10.1186/s13321-026-01196-4","url":null,"abstract":"<div><p>Modern biochemistry is producing vast amounts of chemical knowledge. Ontologies, such as the Chemical Entities of Biological Interest (ChEBI) ontology, can help organising this knowledge. With manual classification alone however, ontologies cannot keep up with the growth of their domain. In this work, we propose a novel taxonomy of 67 classes related to peptides, a large branch in ChEBI with nearly 15,000 compounds. The existing natural language definitions in ChEBI have been expanded and specified more precisely. These natural language definitions are accompanied by a logical axiomatisation in monadic second-order logic (MSOL). To use the axiomatisation for automated classification, a methodology has been developed that translates monadic second-order definitions first into partial first-order definitions and finally into an algorithmic classification. This connects three aspects important to ontological definitions: They reflect the opinions of experts, they are unambiguous, and they can be checked automatically. In our evaluation, we compare the results of our classification to the current taxonomy of ChEBI . This reveals potential inconsistencies in ChEBI as well as areas that might benefit from automated extensions. We also evaluate our natural-language definitions in an expert survey.</p><p><b>Scientific contribution:</b> This work provides precise natural-language definitions of 14 current ChEBI classes as well as 53 new peptide-related classes. These definitions are formalised in MSOL and come with an efficient implementation that allows for large-scale molecule classification, including a full classification of ChEBI and PubChem.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"18 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1186/s13321-026-01196-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147808910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptmol: domain adaptation for molecular image recognition with limited supervision. Adaptmol:有限监督下分子图像识别的域自适应。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-02 DOI: 10.1186/s13321-026-01209-2
Feng Hu, Estrid He, Karin Verspoor
{"title":"Adaptmol: domain adaptation for molecular image recognition with limited supervision.","authors":"Feng Hu, Estrid He, Karin Verspoor","doi":"10.1186/s13321-026-01209-2","DOIUrl":"https://doi.org/10.1186/s13321-026-01209-2","url":null,"abstract":"<p><p>Optical Chemical Structure Recognition (OCSR) aims to convert two-dimensional molecular images into machine-readable formats such as SMILES strings. Deep learning has substantially improved OCSR performance, yet most methods rely on synthetic training data and struggle to generalize to real-world inputs, especially hand-drawn diagrams, where stroke width, geometry, and drawing conventions vary widely across individuals. In this work, we propose an image-to-graph model AdaptMol that enables effective transfer from synthetic to real-world data without requiring manual graph annotations in the target domains. AdaptMol is an integrated pipeline that starts with training a base model on synthetic data, and then refines model representations through unsupervised domain adaptation and self-training. Our key insight is that bond features are domain-invariant in nature; they encode structural relationships between atoms that are independent of visual variations across domains. Thus, during domain adaptation, we align bond-level feature distributions via class-conditional Maximum Mean Discrepancy (MMD) to enforce cross-domain consistency. We also design a comprehensive data augmentation strategy to enhance the robustness of the base model, facilitating stable self-training on unlabeled target samples. On hand-drawn molecular images, our model achieves 82.6% accuracy and outperforms the best prior method by 10.7 points, while maintaining competitive performance across four benchmarks comprising molecular images from scientific literature and patent documents.Scientific contributionWe propose AdaptMol, an image-to-graph model that predicts molecular structures as graphs of atoms and bonds, achieving effective transfer from synthetic to hand-drawn molecular images without requiring target domain graph annotations. We combine class-conditional Maximum Mean Discrepancy to align bond features across domains with comprehensive data augmentation to increase training data variation, jointly improving base model accuracy sufficiently for self-training and addressing the critical failure mode of prior approaches that begin with insufficient accuracy. We further introduce a dual position representation that supervises atom positions through both discrete coordinate tokens and continuous spatial heatmaps to reduce false positives in atom localization.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147808840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When does global attention help: a unified empirical study on atomistic graph learning 全局关注何时起作用:原子图学习的统一实证研究
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-05-01 DOI: 10.1186/s13321-026-01171-z
Arindam Chowdhury, Massimiliano Lupo Pasini
{"title":"When does global attention help: a unified empirical study on atomistic graph learning","authors":"Arindam Chowdhury,&nbsp;Massimiliano Lupo Pasini","doi":"10.1186/s13321-026-01171-z","DOIUrl":"10.1186/s13321-026-01171-z","url":null,"abstract":"<div><p>Graph neural networks (GNNs) are widely used as surrogates for costly experiments and first-principles simulations to study the behavior of compounds at atomistic scale, and their architectural complexity is constantly increasing to enable the modeling of complex physics. While most recent GNNs combine more traditional message passing neural networks (MPNNs) layers to model short-range interactions with more advanced graph transformers (GTs) with global attention mechanisms to model long-range interactions, it is still unclear when global attention mechanisms provide real benefits over well-tuned MPNN layers due to inconsistent implementations, features, or hyperparameter tuning. We introduce the first unified, reproducible benchmarking framework–built on HydraGNN–that enables seamless switching among four controlled model classes: MPNN, MPNN with chemistry/topology encoders, GPS-style hybrids of MPNN with global attention, and fully fused localglobal models with encoders. Using seven diverse open-source datasets for benchmarking across regression and classification tasks, we systematically isolate the contributions of message passing, global attention, and encoder-based feature augmentation. Our study shows that encoder-augmented MPNNs form a robust baseline, while fused localglobal models yield the clearest benefits for properties governed by long-range interaction effects. We further quantify the accuracycompute trade-offs of attention, reporting its overhead in memory. Together, these results establish the first controlled evaluation of global attention in atomistic graph learning and provide a reproducible testbed for future model development.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"18 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1186/s13321-026-01171-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147796374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LEP-AD: language embedding of proteins and attention to drugs predicts drug-target interactions. LEP-AD:蛋白质的语言嵌入和对药物的关注预测药物-靶标相互作用。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-04-27 DOI: 10.1186/s13321-026-01167-9
Reem Alsulami, Robert Lehmann, Anuj Daga, Sumeer A Khan, Raik Grünberg, Ahmed Abogosh, David Gomez Cabrero, Stefan T Arold, Robert Hoehndorf, Jesper Tegner, Narsis A Kiani
{"title":"LEP-AD: language embedding of proteins and attention to drugs predicts drug-target interactions.","authors":"Reem Alsulami, Robert Lehmann, Anuj Daga, Sumeer A Khan, Raik Grünberg, Ahmed Abogosh, David Gomez Cabrero, Stefan T Arold, Robert Hoehndorf, Jesper Tegner, Narsis A Kiani","doi":"10.1186/s13321-026-01167-9","DOIUrl":"https://doi.org/10.1186/s13321-026-01167-9","url":null,"abstract":"<p><strong>Introduction: </strong>Predicting drug-target interactions remains a significant challenge in drug development and lead optimization. Recent advances have leveraged machine learning algorithms to model drug-target interactions from molecular and sequence data.</p><p><strong>Materials and methods: </strong>In this work, we use Evolutionary Scale Modeling (ESM-3) to construct a transformer-based protein language representation for drug-target interaction prediction. We introduce LEP-AD (Language Embedding of Proteins and Attention to Drugs), a modular architecture that combines pretrained protein language models with graph-based molecular encoders to predict binding affinity values.</p><p><strong>Results: </strong>We systematically benchmark LEP-AD alongside a range of established deep learning methods across multiple datasets-Davis, KIBA, DTC, Metz, ToxCast, and STITCH. To assess predictive validity, we compare model-derived rankings of drug-target interactions with experimental results reported in the literature. In addition, we perform new experimental assays to evaluate the binding of three ATP-competitive Src kinase inhibitors-Dasatinib, UM-164, and Saracatinib-where experimentally measured IC₅₀ and pKᵢ values are consistent with the predicted rankings.</p><p><strong>Conclusion: </strong>In summary, our benchmark highlights the strengths and limitations of current drug-target interaction models across diverse datasets and evaluation settings. The results emphasize the impact of pretrained protein and molecular representations on predictive performance and illustrate the persistent challenges of generalization, while the modular LEP-AD framework provides a flexible reference point for comparative evaluation.</p><p><strong>Scientific contribution: </strong>This study presents LEP-AD, a modular deep learning framework for drug-target interaction prediction that integrates pretrained protein language representations with graph-based molecular encoders. Beyond introducing the architecture, we provide a systematic benchmark under similarity-aware evaluation settings and experimental validation, highlighting the impact of pretrained protein embeddings on predictive behavior across diverse datasets.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147759213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assigning the stereochemistry of natural products by machine learning. 通过机器学习分配天然产物的立体化学。
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2026-04-25 DOI: 10.1186/s13321-026-01205-6
Markus Orsi, Jean-Louis Reymond
{"title":"Assigning the stereochemistry of natural products by machine learning.","authors":"Markus Orsi, Jean-Louis Reymond","doi":"10.1186/s13321-026-01205-6","DOIUrl":"https://doi.org/10.1186/s13321-026-01205-6","url":null,"abstract":"<p><p>Nature has settled for L-chirality for proteinogenic amino acids and D-chirality for the carbohydrate backbone of nucleotides. Further stereochemical patterns exist among natural products produced by common biosynthetic pathways. Here we asked the question whether these regularities might be sufficiently prevalent among natural products (NPs) such that their stereochemistry could be machine learned and assigned automatically. Indeed, we report that a language model can be trained to assign the stereochemistry of NPs using the open access NP database COCONUT. In detail, our language model, called NPstereo, translates an NP structure written as absolute SMILES into the corresponding isomeric SMILES notation containing stereochemical information, with 80.2% per-stereocenter accuracy for full assignments and 85.9% per-stereocenter accuracy for partial assignments, across various NP classes including secondary metabolites such as alkaloids, polyketides, lipids and terpenes. NPstereo might be useful to assign or correct the stereochemistry of newly discovered NPs.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":" ","pages":""},"PeriodicalIF":5.7,"publicationDate":"2026-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147759202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书