Journal of Cheminformatics最新文献_第10页

Translating community-wide spectral library into actionable chemical knowledge: a proof of concept with monoterpene indole alkaloids 将社区范围内的光谱库转化为可操作的化学知识：单萜吲哚生物碱的概念证明

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-28 DOI: 10.1186/s13321-025-01009-0

Sarah Szwarc, Adriano Rutz, Kyungha Lee, Yassine Mejri, Olivier Bonnet, Hazrina Hazni, Adrien Jagora, Rany B. Mbeng Obame, Jin Kyoung Noh, Elvis Otogo N’Nang, Stephenie C. Alaribe, Khalijah Awang, Guillaume Bernadat, Young Hae Choi, Vincent Courdavault, Michel Frederich, Thomas Gaslonde, Florian Huber, Toh-Seok Kam, Yun Yee Low, Erwan Poupon, Justin J. J. van der Hooft, Kyo Bin Kang, Pierre Le Pogam, Mehdi A. Beniddir

{"title":"Translating community-wide spectral library into actionable chemical knowledge: a proof of concept with monoterpene indole alkaloids","authors":"Sarah Szwarc, Adriano Rutz, Kyungha Lee, Yassine Mejri, Olivier Bonnet, Hazrina Hazni, Adrien Jagora, Rany B. Mbeng Obame, Jin Kyoung Noh, Elvis Otogo N’Nang, Stephenie C. Alaribe, Khalijah Awang, Guillaume Bernadat, Young Hae Choi, Vincent Courdavault, Michel Frederich, Thomas Gaslonde, Florian Huber, Toh-Seok Kam, Yun Yee Low, Erwan Poupon, Justin J. J. van der Hooft, Kyo Bin Kang, Pierre Le Pogam, Mehdi A. Beniddir","doi":"10.1186/s13321-025-01009-0","DOIUrl":"10.1186/s13321-025-01009-0","url":null,"abstract":"<div><p>With over 3000 representatives, the monoterpene indole alkaloids (MIAs) class is among the most diverse families of plant natural products. The MS/MS spectral space exploration of these complex compounds using chemoinformatic and computational mass spectrometry tools offers a valuable opportunity to extract and share chemical insights from this emblematic family of natural products (NPs). In this work, we first present a substantially updated version of the MIADB, a database now containing 422 MS/MS spectra of MIAs that has been uploaded to the GNPS library versus 172 initial entries. We then introduce an innovative workflow that leverages hundreds of fragmentation spectra to support the FAIRification, extraction and dissemination of chemical knowledge. This workflow aims at the extraction of spectral patterns matching finely defined MIA skeletons. These extracted signatures can then be queried against complex biological extract datasets using MassQL. By applying this strategy to an LC-MS/MS dataset of 75 plant extracts, our results demonstrated the efficiency of this approach in identifying the diversity of MIA skeletons present in the analyzed samples. Additionally, our work enabled the digitization of structural data for diverse MIA skeletons by converting them into machine-readable formats and thereby enhancing their dissemination for the scientific community.</p><p><b>Scientific contribution</b> A comprehensive investigation of the monoterpene indole alkaloid chemical space, aiming to highlight skeleton-dependent fragmentation similarity trends and to generate valuable spectrometric signatures that could be used as queries.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01009-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SMILES all around: structure to SMILES conversion for transition metal complexes 周围的SMILES：过渡金属配合物的结构到SMILES的转换

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-28 DOI: 10.1186/s13321-025-01008-1

Maria H. Rasmussen, Magnus Strandgaard, Julius Seumer, Laura K. Hemmingsen, Angelo Frei, David Balcells, Jan H. Jensen

{"title":"SMILES all around: structure to SMILES conversion for transition metal complexes","authors":"Maria H. Rasmussen, Magnus Strandgaard, Julius Seumer, Laura K. Hemmingsen, Angelo Frei, David Balcells, Jan H. Jensen","doi":"10.1186/s13321-025-01008-1","DOIUrl":"10.1186/s13321-025-01008-1","url":null,"abstract":"<div><p>We present a method for creating RDKit-parsable SMILES for transition metal complexes (TMCs) based on xyz-coordinates and overall charge of the complex. This can be viewed as an extension to the program xyz2mol that does the same for organic molecules. The only dependency is RDKit, which makes it widely applicable. One thing that has been lacking when it comes to generating SMILES from structure for TMCs is an existing SMILES dataset to compare with. Therefore, sanity-checking a method has required manual work. Therefore, we also generate SMILES two other ways; one where ligand charges and TMC connectivity are based on natural bond orbital (NBO) analysis from density functional theory (DFT) calculations utilizing recent work by Kneiding et al. (Digit Discov 2: 618–633, 2023). Another one fixes SMILES available through the Cambridge Structural Database (CSD), making them parsable by RDKit. We compare these three different ways of obtaining SMILES for a subset of the CSD (tmQMg) and find >70% agreement for all three pairs. We utilize these SMILES to make simple molecular fingerprint (FP) and graph-based representations of the molecules to be used in the context of machine learning. Comparing with the graphs made by Kneiding et al. where nodes and edges are featurized with DFT properties, we find that depending on the target property (polarizability, HOMO-LUMO gap or dipole moment) the SMILES based representations can perform equally well. This makes them very suitable as baseline-models. Finally we present a dataset of 227k RDKit parsable SMILES for mononuclear TMCs in the CSD.</p><p><b>Scientific contribution</b> We present a method that can create RDKit-parsable SMILES strings of transition metal complexes (TMCs) from Cartesian coordinates and use it to create a dataset of 227k TMC SMILES strings. The RDKit-parsability allows us to generate perform machine learning studies of TMC properties using ”standard” molecular representations such as fingerprints and 2D-graph convolution. We show that these relatively simple representations can perform quite well depending on the target property.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01008-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualising lead optimisation series using reduced graphs 可视化领先优化系列使用简化的图表

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-24 DOI: 10.1186/s13321-025-01002-7

Jessica Stacey, Baptiste Canault, Stephen D. Pickett, Valerie J. Gillet

{"title":"Visualising lead optimisation series using reduced graphs","authors":"Jessica Stacey, Baptiste Canault, Stephen D. Pickett, Valerie J. Gillet","doi":"10.1186/s13321-025-01002-7","DOIUrl":"10.1186/s13321-025-01002-7","url":null,"abstract":"<div><p>The typical way in which lead optimisation (LO) series are represented in the medicinal chemistry literature is as Markush structures and associated R-group tables. The Markush structure shows a central core or molecular scaffold that is common to the series with R groups that indicate the points of variability that have been explored in the series. The associated R-group table shows the substituent combinations that exist in individual molecules in the series together with properties of those compounds. This format provides an intuitive way of visualising any structure–activity relationship (SAR) that is present. Automated approaches that attempt to reproduce this well understood format, such as the SAR map, are based on maximum common substructure approaches and do not take account of small changes that may be made to the core structure itself or of the situation where more than one core exists in the data. Here we describe an automated approach to represent LO series that is based on reduced graph descriptions of molecules. A publicly available LO dataset from a drug discovery programme at GSK is analysed to show how the method can group together compounds from the same series even when there are small substructural differences within the core of the series while also being able to identify different related compound series. The resulting visualisation is useful in identifying areas where series are under explored and for mapping design ideas onto the current dataset. The code to generate the visualisations is released into the public domain to promote further research in this area.</p><p><b>Scientific contribution</b>: We describe a software tool for analysing lead optimisation series using reduced graph representations of molecules. The representation allows compounds that have similar but not identical chemical scaffolds to be grouped together and is, therefore, an advance on methods that are based on the more traditional Markush structure and SAR tables. The software is a useful addition to the med chem toolbox as it can provide a holistic view of lead optimisation data by representing what might otherwise be seen as separate series as a single series of compounds.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01002-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design 使用预训练bert和贝叶斯主动学习的分子特性预测：药物设计的数据高效方法

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-23 DOI: 10.1186/s13321-025-00986-6

Muhammad Arslan Masood, Samuel Kaski, Tianyu Cui

{"title":"Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design","authors":"Muhammad Arslan Masood, Samuel Kaski, Tianyu Cui","doi":"10.1186/s13321-025-00986-6","DOIUrl":"10.1186/s13321-025-00986-6","url":null,"abstract":"<p>In drug discovery, prioritizing compounds for experimental testing is a critical task that can be optimized through active learning by strategically selecting informative molecules. Active learning typically trains models on labeled examples alone, while unlabeled data is only used for acquisition. This fully supervised approach neglects valuable information present in unlabeled molecular data, impairing both predictive performance and the molecule selection process. We address this limitation by integrating a transformer-based BERT model, pretrained on 1.26 million compounds, into the active learning pipeline. This effectively disentangles representation learning and uncertainty estimation, leading to more reliable molecule selection. Experiments on Tox21 and ClinTox datasets demonstrate that our approach achieves equivalent toxic compound identification with 50% fewer iterations compared to conventional active learning. Analysis reveals that pretrained BERT representations generate a structured embedding space enabling reliable uncertainty estimation despite limited labeled data, confirmed through Expected Calibration Error measurements. This work establishes that combining pretrained molecular representations with active learning significantly improves both model performance and acquisition efficiency in drug discovery, providing a scalable framework for compound prioritization.\u0000</p><p>We demonstrate that high-quality molecular representations fundamentally determine active learning success in drug discovery, outweighing acquisition strategy selection. We provide a framework that integrates pretrained transformer models with Bayesian active learning to separate representation learning from uncertainty estimation—a critical distinction in low-data scenarios. This approach establishes a foundation for more efficient screening workflows across diverse pharmaceutical applications.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00986-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-throughput screening data generation, scoring and FAIRification: a case study on nanomaterials 高通量筛选数据生成，评分和公平化：纳米材料的案例研究

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-23 DOI: 10.1186/s13321-025-01001-8

Gergana Tancheva, Vesa Hongisto, Konrad Patyra, Luchesar Iliev, Nikolay Kochev, Penny Nymark, Pekka Kohonen, Nina Jeliazkova, Roland Grafström

{"title":"High-throughput screening data generation, scoring and FAIRification: a case study on nanomaterials","authors":"Gergana Tancheva, Vesa Hongisto, Konrad Patyra, Luchesar Iliev, Nikolay Kochev, Penny Nymark, Pekka Kohonen, Nina Jeliazkova, Roland Grafström","doi":"10.1186/s13321-025-01001-8","DOIUrl":"10.1186/s13321-025-01001-8","url":null,"abstract":"<div><p>In vitro-based high-throughput screening (HTS) technology is applicable to hazard-based ranking and grouping of diverse agents, including nanomaterials (NMs). We present a standardized HTS-derived human cell-based testing protocol which combines the analysis of five assays into a broad toxic mode-of-action-based hazard value, termed Tox5-score. The overall protocol includes automated data FAIRification, preprocessing and score calculation. A newly developed Python module ToxFAIRy can be used independently or within an Orange Data Mining workflow that has custom widgets for fine-tuning, included in the custom-developed Orange add-on Orange3-ToxFAIRy. The created data-handling workflow has the advantage of facilitated conversion of the FAIR HTS data into the NeXus format, capable of integrating all data and metadata into a single file and multidimensional matrix amenable to interactive visualizations and selection of data subsets. The resulting FAIR HTS data includes both raw and interpreted data (scores) in machine-readable formats distributable as data archive, including into the eNanoMapper database and Nanosafety Data Interface. We overall present a HTS-driven FAIRifed computational assessment tool for hazard analysis of multiple agents simultaneously, including with broad potential applicability across diverse scientific communities.</p><p><b>Scientific Contribution</b> Our study represents significant tool development for analyzing multiple materials hazards rapidly and simultaneously, aligning with regulatory recommendations and addressing industry needs. The innovative integration of in vitro-based toxicity scoring with automated data preprocessing within FAIRification workflows enhances the applicability of HTS-derived data application in the materials development community. The protocols described increase the effectiveness of materials toxicity testing and mode-of-action research by offering an alternative to manual data processing, enrichment of HTS data with metadata, refining testing methodologies—such as for bioactivity-based grouping—and overall, demonstrates the value of reusing existing data.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01001-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GESim: ultrafast graph-based molecular similarity calculation via von Neumann graph entropy GESim：通过冯-诺依曼图熵进行基于图的超快分子相似性计算

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-22 DOI: 10.1186/s13321-025-01003-6

Hiroaki Shiokawa, Shoichi Ishida, Kei Terayama

引用次数: 0

Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset 在高度整理的数据集上用基于图卷积的神经网络预测水的溶解度

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-21 DOI: 10.1186/s13321-025-01000-9

Nadin Ulrich, Karsten Voigt, Anton Kudria, Alexander Böhme, Ralf-Uwe Ebert

{"title":"Prediction of the water solubility by a graph convolutional-based neural network on a highly curated dataset","authors":"Nadin Ulrich, Karsten Voigt, Anton Kudria, Alexander Böhme, Ralf-Uwe Ebert","doi":"10.1186/s13321-025-01000-9","DOIUrl":"10.1186/s13321-025-01000-9","url":null,"abstract":"<div><p>Water solubility is a relevant physico-chemcial property in environmental chemistry, toxicology, and drug design. Although the water solubility is besides the octanol–water partition coefficient, melting point, and boiling point a property with a large amount of available experimental data, there are still more compounds in the chemical universe for which information on their water solubility is lacking. Thus, prediction tools with a broad application domain are needed to fill the corresponding data gaps. To this end, we developed a graph convolutional neural network model (GNN) to predict the water solubility in the form of log <i>S</i><sub>w</sub> based on a highly curated dataset of 9800 chemicals. We started our model development with a curation workflow of the AqSolDB data, ending with 7605 data points. We added 2195 chemicals with experimental data, which we found in the literature, to our dataset. In the final dataset, log <i>S</i><sub>w</sub> values range from − 13.17 to 0.50. Higher values were excluded by a cut-off introduced to eliminate fully miscible chemicals. We developed a consensus GNN by a fivefold split of the corresponding training set (70% of the data) and validation set (20%) and used 10% as independent test set for the evaluation of the performance of the different splits and the consensus model. By doing so, we achieved an <i>r</i><sup>2</sup> of 0.901, a <i>q</i><sup>2</sup> of 0.896, and an <i>rmse</i> of 0.657 on our independently selected test set, which is close to the experimental error of 0.5 to 0.6 log units. We further provide the information on the application domain and compare our performance to other existing prediction tools.</p><p><b>Scientific contribution</b> Based on a highly curated dataset, we developed a neural network to predict the water solubility of chemicals for a broad application domain. Data curation was done by us in a step-wise procedure, where we identified various errors in the experimental data. Based on an independent test set, we compare our prediction results to those of the available prediction models.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01000-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning motif features and topological structure of molecules for metabolic pathway prediction 学习分子的基序特征和拓扑结构，用于代谢途径预测

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-21 DOI: 10.1186/s13321-025-00994-6

Jianguo Hu, Yiqing Zhang, Jinxin Xie, Zhen Yuan, Zhangxiang Yin, Shanshan Shi, Honglin Li, Shiliang Li

{"title":"Learning motif features and topological structure of molecules for metabolic pathway prediction","authors":"Jianguo Hu, Yiqing Zhang, Jinxin Xie, Zhen Yuan, Zhangxiang Yin, Shanshan Shi, Honglin Li, Shiliang Li","doi":"10.1186/s13321-025-00994-6","DOIUrl":"10.1186/s13321-025-00994-6","url":null,"abstract":"<div><p>Metabolites serve as crucial biomarkers for assessing disease progression and understanding underlying pathogenic mechanisms. However, when the metabolic pathway category of metabolites is unknown, researchers face challenges in conducting metabolomic analyses. Due to the complexity of wet laboratory experimentation for pathway identification, there is a growing demand for predictive methods. Various computational approaches, including machine learning and graph neural networks, have been proposed; however, interpretability remains a challenge. We have developed a neural network framework called MotifMol3D, which is designed for predicting molecular metabolic pathway categories. This framework introduces motif information to mine local features of small-sample molecules, combining with graph neural network and 3D information to complete the prediction task. Using a dataset of 5,698 molecules that participate in 11 metabolic pathway categories in the KEGG database, MotifMol3D outperformed state-of-the-art methods in precision, recall, and F1 score. In addition, ablation study and motif analysis have demonstrated the effectiveness and usefulness of the model. Motif analysis, in particular, has shown motif information can actually characterize the main features of specific pathway molecules to a certain extent and enhance the interpretability of the model. An external validation further corroborates this observation. MotifMol3D is an open-source tool that is available at https://github.com/Irena-Zhang/MotifMol3D.git.</p><p><b>Scientific contribution</b> MotifMol3D integrates motif information, graph neural networks, and 3D structural data to enhance feature extraction for small-sample molecules, improving the precision and interpretability of metabolic pathway predictions. The model outperforms state-of-the-art approaches in precision, recall, and F1 score. This work reveals how motif information characterizes pathway-specific molecules, offering novel insights into molecular properties within metabolic pathways.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00994-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Activity cliff-aware reinforcement learning for de novo drug design 活动悬崖感知强化学习用于新药物设计

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-21 DOI: 10.1186/s13321-025-01006-3

Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

{"title":"Activity cliff-aware reinforcement learning for de novo drug design","authors":"Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang","doi":"10.1186/s13321-025-01006-3","DOIUrl":"10.1186/s13321-025-01006-3","url":null,"abstract":"<div><p>The integration of artificial intelligence (AI) in drug discovery offers promising opportunities to streamline and enhance the traditional drug development process. One core challenge in <i>de novo</i> molecular design is modeling complex structure-activity relationships (SAR), such as activity cliffs, where minor molecular changes yield significant shifts in biological activity. In response to the limitations of current models in capturing these critical discontinuities, we propose the Activity Cliff-Aware Reinforcement Learning (ACARL) framework. ACARL leverages a novel activity cliff index to identify and amplify activity cliff compounds, uniquely incorporating them into the reinforcement learning (RL) process through a tailored contrastive loss. This RL framework is designed to focus model optimization on high-impact regions within the SAR landscape, improving the generation of molecules with targeted properties. Experimental evaluations across multiple protein targets demonstrate ACARL’s superior performance in generating high-affinity molecules compared to existing state-of-the-art algorithms. These findings indicate that ACARL effectively integrates SAR principles into the RL-based drug design pipeline, offering a robust approach for <i>de novo</i> molecular design</p><p><b>Scientific contribution</b> Our work introduces a machine learning-based drug design framework that explicitly models activity cliffs, a first in AI-driven molecular design. ACARL’s primary technical contributions include the formulation of an activity cliff index to detect these critical points, and a contrastive RL loss function that dynamically enhances the generation of activity cliff compounds, optimizing the model for high-impact SAR regions. This approach demonstrates the efficacy of combining domain knowledge with machine learning advances, significantly expanding the scope and reliability of AI in drug discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01006-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The pucke.rs toolkit to facilitate sampling the conformational space of biomolecular monomers pucke。Rs工具包，以方便采样生物分子单体的构象空间

IF 7.1 2区化学

Journal of Cheminformatics Pub Date : 2025-04-17 DOI: 10.1186/s13321-025-00977-7

Jérôme Rihon, Sten Reynders, Vitor Bernardes Pinheiro, Eveline Lescrinier

引用次数: 0