Journal of Cheminformatics最新文献

筛选
英文 中文
hERGAT: predicting hERG blockers using graph attention mechanism through atom- and molecule-level interaction analyses
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-28 DOI: 10.1186/s13321-025-00957-x
Dohyeon Lee, Sunyong Yoo
{"title":"hERGAT: predicting hERG blockers using graph attention mechanism through atom- and molecule-level interaction analyses","authors":"Dohyeon Lee,&nbsp;Sunyong Yoo","doi":"10.1186/s13321-025-00957-x","DOIUrl":"10.1186/s13321-025-00957-x","url":null,"abstract":"<div><p>The human ether-a-go-go-related gene (hERG) channel plays a critical role in the electrical activity of the heart, and its blockers can cause serious cardiotoxic effects. Thus, screening for hERG channel blockers is a crucial step in the drug development process. Many in silico models have been developed to predict hERG blockers, which can efficiently save time and resources. However, previous methods have found it hard to achieve high performance and to interpret the predictive results. To overcome these challenges, we have proposed hERGAT, a graph neural network model with an attention mechanism, to consider compound interactions on atomic and molecular levels. In the atom-level interaction analysis, we applied a graph attention mechanism (GAT) that integrates information from neighboring nodes and their extended connections. The hERGAT employs a gated recurrent unit (GRU) with the GAT to learn information between more distant atoms. To confirm this, we performed clustering analysis and visualized a correlation heatmap, verifying the interactions between distant atoms were considered during the training process. In the molecule-level interaction analysis, the attention mechanism enables the target node to focus on the most relevant information, highlighting the molecular substructures that play crucial roles in predicting hERG blockers. Through a literature review, we confirmed that highlighted substructures have a significant role in determining the chemical and biological characteristics related to hERG activity. Furthermore, we integrated physicochemical properties into our hERGAT model to improve the performance. Our model achieved an area under the receiver operating characteristic of 0.907 and an area under the precision-recall of 0.904, demonstrating its effectiveness in modeling hERG activity and offering a reliable framework for optimizing drug safety in early development stages.</p><p><b>Scientific contribution:</b></p><p>hERGAT is a deep learning model for predicting hERG blockers by combining GAT and GRU, enabling it to capture complex interactions at atomic and molecular levels. We improve the model's interpretability by analyzing the highlighted molecular substructures, providing valuable insights into their roles in determining hERG activity. The model achieves high predictive performance, confirming its potential as a preliminary tool for early cardiotoxicity assessment and enhancing the reliability of the results.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00957-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143055009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The algebraic extended atom-type graph-based model for precise ligand–receptor binding affinity prediction 基于代数扩展原子型图的配体-受体结合亲和力精确预测模型
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-22 DOI: 10.1186/s13321-025-00955-z
Farjana Tasnim Mukta, Md Masud Rana, Avery Meyer, Sally Ellingson, Duc D. Nguyen
{"title":"The algebraic extended atom-type graph-based model for precise ligand–receptor binding affinity prediction","authors":"Farjana Tasnim Mukta,&nbsp;Md Masud Rana,&nbsp;Avery Meyer,&nbsp;Sally Ellingson,&nbsp;Duc D. Nguyen","doi":"10.1186/s13321-025-00955-z","DOIUrl":"10.1186/s13321-025-00955-z","url":null,"abstract":"<div><p>Accurate prediction of ligand-receptor binding affinity is crucial in structure-based drug design, significantly impacting the development of effective drugs. Recent advances in machine learning (ML)–based scoring functions have improved these predictions, yet challenges remain in modeling complex molecular interactions. This study introduces the AGL-EAT-Score, a scoring function that integrates extended atom-type multiscale weighted colored subgraphs with algebraic graph theory. This approach leverages the eigenvalues and eigenvectors of graph Laplacian and adjacency matrices to capture high-level details of specific atom pairwise interactions. Evaluated against benchmark datasets such as CASF-2016, CASF-2013, and the Cathepsin S dataset, the AGL-EAT-Score demonstrates notable accuracy, outperforming existing traditional and ML-based methods. The model’s strength lies in its comprehensive similarity analysis, examining protein sequence, ligand structure, and binding site similarities, thus ensuring minimal bias and over-representation in the training sets. The use of extended atom types in graph coloring enhances the model’s capability to capture the intricacies of protein-ligand interactions. The AGL-EAT-Score marks a significant advancement in drug design, offering a tool that could potentially refine and accelerate the drug discovery process.</p><p><b>Scientific Contribution</b></p><p> The AGL-EAT-Score presents an algebraic graph-based framework that predicts ligand-receptor binding affinity by constructing multiscale weighted colored subgraphs from the 3D structure of protein-ligand complexes. It improves prediction accuracy by modeling interactions between extended atom types, addressing challenges like dataset bias and over-representation. Benchmark evaluations demonstrate that AGL-EAT-Score outperforms existing methods, offering a robust and systematic tool for structure-based drug design.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00955-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StreamChol: a web-based application for predicting cholestasis StreamChol:一个基于网络的预测胆汁淤积的应用程序
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-21 DOI: 10.1186/s13321-024-00943-9
Pablo Rodríguez-Belenguer, Emilio Soria-Olivas, Manuel Pastor
{"title":"StreamChol: a web-based application for predicting cholestasis","authors":"Pablo Rodríguez-Belenguer,&nbsp;Emilio Soria-Olivas,&nbsp;Manuel Pastor","doi":"10.1186/s13321-024-00943-9","DOIUrl":"10.1186/s13321-024-00943-9","url":null,"abstract":"<div><p>This article introduces StreamChol, a software for developing and applying mechanistic models to predict cholestasis. StreamChol is a Streamlit application, usable as a desktop application or web-accessible software when installed on a server using a docker container.</p><p>StreamChol allows a seamless integration of pharmacokinetic analyses with Machine Learning models. This integration not only enables cholestasis prediction but also opens avenues for predicting other toxicological endpoints requiring similar integrations. StreamChol's Docker containerization also streamlines deployment across diverse environments, addressing potential compatibility issues. StreamChol is distributed as open-source under GNU GPL v3, reflecting our commitment to open science. Through StreamChol, researchers are offered a potent tool for predictive modelling in toxicology, harnessing its strengths within an intuitive and user-friendly interface, without the need for any programming knowledge.</p><p><b>Scientific contribution </b> This work offers a user-friendly web-based tool for cholestasis prediction and a complete workflow for creating web platforms that require the combination of both programming languages, R and Python.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00943-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matched pairs demonstrate robustness against inter-assay variability 配对对对测定间变异性具有稳健性
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-20 DOI: 10.1186/s13321-025-00956-y
Jochem Nelen, Horacio Pérez-Sánchez, Hans De Winter, Dries Van Rompaey
{"title":"Matched pairs demonstrate robustness against inter-assay variability","authors":"Jochem Nelen,&nbsp;Horacio Pérez-Sánchez,&nbsp;Hans De Winter,&nbsp;Dries Van Rompaey","doi":"10.1186/s13321-025-00956-y","DOIUrl":"10.1186/s13321-025-00956-y","url":null,"abstract":"<div><p>Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K<sub>i</sub> and IC<sub>50</sub> values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00956-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemical space as a unifying theme for chemistry 化学空间作为化学的统一主题
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-16 DOI: 10.1186/s13321-025-00954-0
Jean-Louis Reymond
{"title":"Chemical space as a unifying theme for chemistry","authors":"Jean-Louis Reymond","doi":"10.1186/s13321-025-00954-0","DOIUrl":"10.1186/s13321-025-00954-0","url":null,"abstract":"<div><p>Chemistry has diversified from a basic understanding of the elements to studying millions of highly diverse molecules and materials, which together are conceptualized as the chemical space. A map of this chemical space where distances represent similarities between compounds can represent the mutual relationships between different subfields of chemistry and help the discipline to be viewed and understood globally.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00954-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142987640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening 一个尺寸不适合所有:修订用于虚拟筛选的QSAR模型评估准确性的传统范式
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-16 DOI: 10.1186/s13321-025-00948-y
James Wellnitz, Sankalp Jain, Joshua E. Hochuli, Travis Maxfield, Eugene N. Muratov, Alexander Tropsha, Alexey V. Zakharov
{"title":"One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening","authors":"James Wellnitz,&nbsp;Sankalp Jain,&nbsp;Joshua E. Hochuli,&nbsp;Travis Maxfield,&nbsp;Eugene N. Muratov,&nbsp;Alexander Tropsha,&nbsp;Alexey V. Zakharov","doi":"10.1186/s13321-025-00948-y","DOIUrl":"10.1186/s13321-025-00948-y","url":null,"abstract":"<div><p>Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we now recommend the use of models with the highest positive predictive value (PPV) built on imbalanced training sets as preferred virtual screening tools. This recommendation stems from practical considerations of how the results of virtual screening are used in experimental laboratories where only a small fraction of virtually screened molecules can be tested using standard well plates. As a proof of concept, we have developed QSAR models for five expansive datasets with different ratios of active and inactive molecules and compared model performance in virtual screening using BA, PPV, and other metrics. We show that training on imbalanced datasets achieves a hit rate at least 30% higher than using balanced datasets, and that the PPV metric captured this difference of performance with no parameter tuning. Importantly, hit rates were estimated for top scoring compounds organized in batches of the size of plates (for instance, 128 molecules) used in the experimental high throughput screening. Based on the results of our studies, we posit that QSAR models trained on imbalanced datasets with the highest PPV should be relied upon to identify and test hit compounds in early drug discovery studies.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00948-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142987639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing 基于自然语言处理概念的构效关系迁移模拟序列上下文相关相似性分析
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-15 DOI: 10.1186/s13321-025-00951-3
Atsushi Yoshimori, Jürgen Bajorath
{"title":"Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing","authors":"Atsushi Yoshimori,&nbsp;Jürgen Bajorath","doi":"10.1186/s13321-025-00951-3","DOIUrl":"10.1186/s13321-025-00951-3","url":null,"abstract":"<div><p>Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing. The methodology comprehensively accounts for substituent similarity, identifies non-classical bioisosteres, captures substituent-property relationships, and generates accurate AS alignments. Context-dependent similarity assessment is conceptually novel in computational medicinal chemistry and should also be of interest for other applications.</p><p><b>Scientific contribution</b></p><p>A method is reported to systematically search for and align analogue series with SAR transfer potential. Central to the approach is the assessment of context-dependent similarity for substituents, a new concept in cheminformatics, which is based upon vector embeddings and word pair relationships adapted from natural language processing.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00951-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology Fragmenstein:使用严格的基于保守结合的方法预测从已知晶体碎片命中衍生的化合物的蛋白质配体结构
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-13 DOI: 10.1186/s13321-025-00946-0
Matteo P. Ferla, Rubén Sánchez-García, Rachael E. Skyner, Stefan Gahbauer, Jenny C. Taylor, Frank von Delft, Brian D. Marsden, Charlotte M. Deane
{"title":"Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology","authors":"Matteo P. Ferla,&nbsp;Rubén Sánchez-García,&nbsp;Rachael E. Skyner,&nbsp;Stefan Gahbauer,&nbsp;Jenny C. Taylor,&nbsp;Frank von Delft,&nbsp;Brian D. Marsden,&nbsp;Charlotte M. Deane","doi":"10.1186/s13321-025-00946-0","DOIUrl":"10.1186/s13321-025-00946-0","url":null,"abstract":"<div><p>Current strategies centred on either merging or linking initial hits from fragment-based drug design (FBDD) crystallographic screens generally do not fully leaverage 3D structural information. We show that an algorithmic approach (Fragmenstein) that ‘stitches’ the ligand atoms from this structural information together can provide more accurate and reliable predictions for protein–ligand complex conformation than general methods such as pharmacophore-constrained docking. This approach works under the assumption of conserved binding: when a larger molecule is designed containing the initial fragment hit, the common substructure between the two will adopt the same binding mode. Fragmenstein either takes the atomic coordinates of ligands from a experimental fragment screen and combines the atoms together to produce a novel merged virtual compound, or uses them to predict the bound complex for a provided molecule. The molecule is then energy minimised under strong constraints to obtain a structurally plausible conformer. The code is available at https://github.com/oxpig/Fragmenstein.</p><p><b>Scientific contribution</b></p><p>This work shows the importance of using the coordinates of known binders when predicting the conformation of derivative molecules through a retrospective analysis of the COVID Moonshot data. This method has had a prior real-world application in hit-to-lead screening, yielding a sub-micromolar merger from parent hits in a single round. It is therefore likely to further benefit future drug design campaigns and be integrated in future pipelines.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00946-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction ADMET在药物发现中的评价:21。机器学习算法在Caco-2渗透率预测中的应用及工业验证
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-10 DOI: 10.1186/s13321-025-00947-z
Dong Wang, Jieyu Jin, Guqin Shi, Jingxiao Bao, Zheng Wang, Shimeng Li, Peichen Pan, Dan Li, Yu Kang, Tingjun Hou
{"title":"ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction","authors":"Dong Wang,&nbsp;Jieyu Jin,&nbsp;Guqin Shi,&nbsp;Jingxiao Bao,&nbsp;Zheng Wang,&nbsp;Shimeng Li,&nbsp;Peichen Pan,&nbsp;Dan Li,&nbsp;Yu Kang,&nbsp;Tingjun Hou","doi":"10.1186/s13321-025-00947-z","DOIUrl":"10.1186/s13321-025-00947-z","url":null,"abstract":"<div><p>The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates <i>in vitro</i>, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial for enhancing the efficiency of oral drug development. In this study, we conducted an in-depth analysis of the characteristics of an augmented Caco-2 permeability dataset, and evaluated a diverse range of machine learning algorithms in combination with different molecular representations. The results indicated that XGBoost generally provided better predictions than comparable models for the test sets. In addition, we investigated the transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets. Our findings, based on the Shanghai Qilu’s <i>in-house</i> dataset, showed that the boosting models retained a degree of predictive efficacy when applied to industry data. Furthermore, Y-randomization test and applicability domain analysis were employed to assess the robustness and generalizability of these models. Matched Molecular Pair Analysis (MMPA) was utilized to extract chemical transformation rules. We believe that the model developed in this study could represent a reliable tool for assessing Caco-2 permeability during early-stage drug discovery and the chemical transformation rules derived here could provide insights for optimizing Caco-2 permeability.</p><p><b>Scientific contribution</b></p><p>A comprehensive validation of various machine learning algorithms combined with diverse molecular representations on a large dataset for predicting Caco-2 permeability was reported. The transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets was also investigated. Matched molecular pair analysis was carried out to provide reasonable suggestions for researchers to improve the Caco-2 permeability of compounds.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00947-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142941146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions 克莱儿:化学反应EC数的对比学习预测器
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-07 DOI: 10.1186/s13321-024-00944-8
Zishuo Zeng, Jin Guo, Jiao Jin, Xiaozhou Luo
{"title":"CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions","authors":"Zishuo Zeng,&nbsp;Jin Guo,&nbsp;Jiao Jin,&nbsp;Xiaozhou Luo","doi":"10.1186/s13321-024-00944-8","DOIUrl":"10.1186/s13321-024-00944-8","url":null,"abstract":"<div><p>Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (<u>C</u>ontrastive <u>L</u>earning-based <u>A</u>nnotat<u>I</u>on for <u>R</u>eaction’s <u>E</u>C), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast’s metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub (https://github.com/zishuozeng/CLAIRE).</p><p><b>Scientific contribution</b></p><p>This work employed contrastive learning for predicting enzymatic reaction’s EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00944-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142935529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信