Journal of Cheminformatics最新文献

筛选
英文 中文
Hilbert-curve assisted structure embedding method 希尔伯特曲线辅助结构嵌入法
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-29 DOI: 10.1186/s13321-024-00850-z
Gergely Zahoránszky-Kőhalmi, Kanny K. Wan, Alexander G. Godfrey
{"title":"Hilbert-curve assisted structure embedding method","authors":"Gergely Zahoránszky-Kőhalmi,&nbsp;Kanny K. Wan,&nbsp;Alexander G. Godfrey","doi":"10.1186/s13321-024-00850-z","DOIUrl":"10.1186/s13321-024-00850-z","url":null,"abstract":"<div><h3>Motivation</h3><p>Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the ‘‘landscape’’ on the map is prone to ‘‘rearrangement’’ when embedding different sets of compounds.</p><h3>Results</h3><p>In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of ‘‘reference scaffolds’’. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database.</p><h3>Scientific contribution</h3><p>The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist’s reasoning, and the precedential use of space filling (Hilbert) curve in the process.</p><h3>Availability</h3><p>https://github.com/ncats/hcase</p><h3>Graphical Abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00850-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141791021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reproducible MS/MS library cleaning pipeline in matchms matchms 中可重复的 MS/MS 文库清洗管道
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-29 DOI: 10.1186/s13321-024-00878-1
Niek F. de Jonge, Helge Hecht, Michael Strobel, Mingxun Wang, Justin J. J. van der Hooft, Florian Huber
{"title":"Reproducible MS/MS library cleaning pipeline in matchms","authors":"Niek F. de Jonge,&nbsp;Helge Hecht,&nbsp;Michael Strobel,&nbsp;Mingxun Wang,&nbsp;Justin J. J. van der Hooft,&nbsp;Florian Huber","doi":"10.1186/s13321-024-00878-1","DOIUrl":"10.1186/s13321-024-00878-1","url":null,"abstract":"<div><p>Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility, and reproducibility as leading principles.</p><p><b>Scientific contribution</b></p><p>This pipeline will result in cleaner public mass spectral libraries that will improve library searching and the quality of machine-learning training datasets in mass spectrometry. This pipeline builds on previous work by adding new functionality for curating and correcting annotated libraries, by validating structure annotations. Due to the high quality of our software, the reproducibility, and improved logging, we think our new pipeline has the potential to become the standard in the field for cleaning tandem mass spectrometry libraries.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00878-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A computational workflow for analysis of missense mutations in precision oncology 精准肿瘤学中分析错义突变的计算工作流程
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-29 DOI: 10.1186/s13321-024-00876-3
Rayyan Tariq Khan, Petra Pokorna, Jan Stourac, Simeon Borko, Ihor Arefiev, Joan Planas-Iglesias, Adam Dobias, Gaspar Pinto, Veronika Szotkowska, Jaroslav Sterba, Ondrej Slaby, Jiri Damborsky, Stanislav Mazurenko, David Bednar
{"title":"A computational workflow for analysis of missense mutations in precision oncology","authors":"Rayyan Tariq Khan,&nbsp;Petra Pokorna,&nbsp;Jan Stourac,&nbsp;Simeon Borko,&nbsp;Ihor Arefiev,&nbsp;Joan Planas-Iglesias,&nbsp;Adam Dobias,&nbsp;Gaspar Pinto,&nbsp;Veronika Szotkowska,&nbsp;Jaroslav Sterba,&nbsp;Ondrej Slaby,&nbsp;Jiri Damborsky,&nbsp;Stanislav Mazurenko,&nbsp;David Bednar","doi":"10.1186/s13321-024-00876-3","DOIUrl":"10.1186/s13321-024-00876-3","url":null,"abstract":"<div><p>Every year, more than 19 million cancer cases are diagnosed, and this number continues to increase annually. Since standard treatment options have varying success rates for different types of cancer, understanding the biology of an individual's tumour becomes crucial, especially for cases that are difficult to treat. Personalised high-throughput profiling, using next-generation sequencing, allows for a comprehensive examination of biopsy specimens. Furthermore, the widespread use of this technology has generated a wealth of information on cancer-specific gene alterations. However, there exists a significant gap between identified alterations and their proven impact on protein function. Here, we present a bioinformatics pipeline that enables fast analysis of a missense mutation’s effect on stability and function in known oncogenic proteins. This pipeline is coupled with a predictor that summarises the outputs of different tools used throughout the pipeline, providing a single probability score, achieving a balanced accuracy above 86%. The pipeline incorporates a virtual screening method to suggest potential FDA/EMA-approved drugs to be considered for treatment. We showcase three case studies to demonstrate the timely utility of this pipeline. To facilitate access and analysis of cancer-related mutations, we have packaged the pipeline as a web server, which is freely available at https://loschmidt.chemi.muni.cz/predictonco/.</p><p><b>Scientific contribution</b></p><p>This work presents a novel bioinformatics pipeline that integrates multiple computational tools to predict the effects of missense mutations on proteins of oncological interest. The pipeline uniquely combines fast protein modelling, stability prediction, and evolutionary analysis with virtual drug screening, while offering actionable insights for precision oncology. This comprehensive approach surpasses existing tools by automating the interpretation of mutations and suggesting potential treatments, thereby striving to bridge the gap between sequencing data and clinical application.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00876-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CACTI: an in silico chemical analysis tool through the integration of chemogenomic data and clustering analysis CACTI:通过整合化学基因组数据和聚类分析的硅学化学分析工具
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-24 DOI: 10.1186/s13321-024-00885-2
Karla P. Godinez-Macias, Elizabeth A. Winzeler
{"title":"CACTI: an in silico chemical analysis tool through the integration of chemogenomic data and clustering analysis","authors":"Karla P. Godinez-Macias,&nbsp;Elizabeth A. Winzeler","doi":"10.1186/s13321-024-00885-2","DOIUrl":"10.1186/s13321-024-00885-2","url":null,"abstract":"<div><p>It is well-accepted that knowledge of a small molecule’s target can accelerate optimization. Although chemogenomic databases are helpful resources for predicting or finding compound interaction partners, they tend to be limited and poorly annotated. Furthermore, unlike genes, compound identifiers are often not standardized, and many synonyms may exist, especially in the biological literature, making batch analysis of compounds difficult. Here, we constructed an open-source annotation and target hypothesis prediction tool that explores some of the largest chemical and biological databases, mining these for both common name, synonyms, and structurally similar molecules. We used this Chemical Analysis and Clustering for Target Identification (CACTI) tool to analyze the Pathogen Box collection, an open-source set of 400 drug-like compounds active against a variety of microbial pathogens. Our analysis resulted in 4,315 new synonyms, 35,963 pieces of new information and target prediction hints for 58 members.</p><p><b>Scientific contributions</b></p><p>With the employment of this tool, a comprehensive report with known evidence, close analogs and drug-target prediction can be obtained for large-scale chemical libraries that will facilitate their evaluation and future target validation and optimization efforts.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00885-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141755348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing molecular property prediction with auxiliary learning and task-specific adaptation 利用辅助学习和特定任务适应性加强分子特性预测
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-24 DOI: 10.1186/s13321-024-00880-7
Vishal Dey, Xia Ning
{"title":"Enhancing molecular property prediction with auxiliary learning and task-specific adaptation","authors":"Vishal Dey,&nbsp;Xia Ning","doi":"10.1186/s13321-024-00880-7","DOIUrl":"10.1186/s13321-024-00880-7","url":null,"abstract":"<div><p>Pretrained Graph Neural Networks have been widely adopted for various molecular property prediction tasks. Despite their ability to encode structural and relational features of molecules, traditional fine-tuning of such pretrained GNNs on the target task can lead to poor generalization. To address this, we explore the adaptation of pretrained GNNs to the target task by jointly training them with multiple auxiliary tasks. This could enable the GNNs to learn both general and task-specific features, which may benefit the target task. However, a major challenge is to determine the relatedness of auxiliary tasks with the target task. To address this, we investigate multiple strategies to measure the relevance of auxiliary tasks and integrate such tasks by adaptively combining task gradients or by learning task weights via bi-level optimization. Additionally, we propose a novel gradient surgery-based approach, Rotation of Conflicting Gradients (<span>(mathop {texttt{RCGrad}}limits)</span>), that learns to align conflicting auxiliary task gradients through rotation. Our experiments with state-of-the-art pretrained GNNs demonstrate the efficacy of our proposed methods, with improvements of up to 7.7% over fine-tuning. This suggests that incorporating auxiliary tasks along with target task fine-tuning can be an effective way to improve the generalizability of pretrained GNNs for molecular property prediction.</p><p><b>Scientific contribution</b></p><p>We introduce a novel framework for adapting pretrained GNNs to molecular tasks using auxiliary learning to address the critical issue of negative transfer. Leveraging novel gradient surgery techniques such as <span>(mathop {texttt{RCGrad}}limits)</span>, the proposed adaptation framework represents a significant departure from the dominant pretraining fine-tuning approach for molecular GNNs. Our contributions are significant for drug discovery research, especially for tasks with limited data, filling a notable gap in the efficient adaptation of pretrained models for molecular GNNs.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00880-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141755347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore 利用构件和反应感知 SAScore 估算分子的合成可达性。
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-23 DOI: 10.1186/s13321-024-00879-0
Shuan Chen, Yousung Jung
{"title":"Estimating the synthetic accessibility of molecules with building block and reaction-aware SAScore","authors":"Shuan Chen,&nbsp;Yousung Jung","doi":"10.1186/s13321-024-00879-0","DOIUrl":"10.1186/s13321-024-00879-0","url":null,"abstract":"<div><p>Synthetic accessibility prediction is a task to estimate how easily a given molecule might be synthesizable in the laboratory, playing a crucial role in computer-aided molecular design. Although synthesis planning programs can determine synthesis routes, their slow processing times make them impractical for large-scale molecule screening. On the other hand, existing rapid synthesis accessibility estimation methods offer speed but typically lack integration with actual synthesis routes and building block information. In this work, we introduce BR-SAScore, an enhanced version of SAScore that integrates the available building block information (B) and reaction knowledge (R) from synthesis planning programs into the scoring process. In particular, we differentiate fragments inherent in building blocks and fragments to be derived from synthesis (reactions) when scoring synthetic accessibility. Compared to existing methods, our experimental findings demonstrate that BR-SAScore offers more accurate and precise identification of a molecule's synthetic accessibility by the synthesis planning program with a fast calculation time. Moreover, we illustrate how BR-SAScore provides chemically interpretable results, aligning with the capability of the synthesis planning program embedded with the same reaction knowledge and available building blocks.</p><p><b>Scientific contribution</b></p><p>We introduce BR-SAScore, an extension of SAScore, to estimate the synthetic accessibility of molecules by leveraging known building-block and reactivity information. In our experiments, BR-SAScore shows superior prediction performance on predicting molecule synthetic accessibility compared to previous methods, including SAScore and deep-learning models, while requiring significantly less computation time. In addition, we show that BR-SAScore is able to precisely identify the chemical fragment contributing to the synthetic infeasibility, holding great potential for future molecule synthesizability optimization.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11267797/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
piscesCSM: prediction of anticancer synergistic drug combinations piscesCSM:抗癌协同药物组合预测。
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-19 DOI: 10.1186/s13321-024-00859-4
Raghad AlJarf, Carlos H. M. Rodrigues, Yoochan Myung, Douglas E. V. Pires, David B. Ascher
{"title":"piscesCSM: prediction of anticancer synergistic drug combinations","authors":"Raghad AlJarf,&nbsp;Carlos H. M. Rodrigues,&nbsp;Yoochan Myung,&nbsp;Douglas E. V. Pires,&nbsp;David B. Ascher","doi":"10.1186/s13321-024-00859-4","DOIUrl":"10.1186/s13321-024-00859-4","url":null,"abstract":"<p>While drug combination therapies are of great importance, particularly in cancer treatment, identifying novel synergistic drug combinations has been a challenging venture. Computational methods have emerged in this context as a promising tool for prioritizing drug combinations for further evaluation, though they have presented limited performance, utility, and interpretability. Here, we propose a novel predictive tool, piscesCSM, that leverages graph-based representations to model small molecule chemical structures to accurately predict drug combinations with favourable anticancer synergistic effects against one or multiple cancer cell lines. Leveraging these insights, we developed a general supervised machine learning model to guide the prediction of anticancer synergistic drug combinations in over 30 cell lines. It achieved an area under the receiver operating characteristic curve (AUROC) of up to 0.89 on independent non-redundant blind tests, outperforming state-of-the-art approaches on both large-scale oncology screening data and an independent test set generated by AstraZeneca (with more than a 16% improvement in predictive accuracy). Moreover, by exploring the interpretability of our approach, we found that simple physicochemical properties and graph-based signatures are predictive of chemotherapy synergism. To provide a simple and integrated platform to rapidly screen potential candidate pairs with favourable synergistic anticancer effects, we made piscesCSM freely available online at https://biosig.lab.uq.edu.au/piscescsm/ as a web server and API. We believe that our predictive tool will provide a valuable resource for optimizing and augmenting combinatorial screening libraries to identify effective and safe synergistic anticancer drug combinations.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00859-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141726656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reaction rebalancing: a novel approach to curating reaction databases 反应再平衡:整理反应数据库的新方法。
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-19 DOI: 10.1186/s13321-024-00875-4
Tieu-Long Phan, Klaus Weinbauer, Thomas Gärtner, Daniel Merkle, Jakob L. Andersen, Rolf Fagerberg, Peter F. Stadler
{"title":"Reaction rebalancing: a novel approach to curating reaction databases","authors":"Tieu-Long Phan,&nbsp;Klaus Weinbauer,&nbsp;Thomas Gärtner,&nbsp;Daniel Merkle,&nbsp;Jakob L. Andersen,&nbsp;Rolf Fagerberg,&nbsp;Peter F. Stadler","doi":"10.1186/s13321-024-00875-4","DOIUrl":"10.1186/s13321-024-00875-4","url":null,"abstract":"<div><h3>Purpose</h3><p>Reaction databases are a key resource for a wide variety of applications in computational chemistry and biochemistry, including Computer-aided Synthesis Planning (CASP) and the large-scale analysis of metabolic networks. The full potential of these resources can only be realized if datasets are accurate and complete. Missing co-reactants and co-products, i.e., unbalanced reactions, however, are the rule rather than the exception. The curation and correction of such incomplete entries is thus an urgent need.</p><h3>Methods</h3><p>The <span>SynRBL</span> framework addresses this issue with a dual-strategy: a rule-based method for non-carbon compounds, using atomic symbols and counts for prediction, alongside a Maximum Common Subgraph (MCS)-based technique for carbon compounds, aimed at aligning reactants and products to infer missing entities.</p><h3>Results</h3><p>The rule-based method exceeded 99% accuracy, while MCS-based accuracy varied from 81.19 to 99.33%, depending on reaction properties. Furthermore, an applicability domain and a machine learning scoring function were devised to quantify prediction confidence. The overall efficacy of this framework was delineated through its success rate and accuracy metrics, which spanned from 89.83 to 99.75% and 90.85 to 99.05%, respectively.</p><h3>Conclusion</h3><p>The <span>SynRBL</span> framework offers a novel solution for recalibrating chemical reactions, significantly enhancing reaction completeness. With rigorous validation, it achieved groundbreaking accuracy in reaction rebalancing. This sets the stage for future improvement in particular of atom-atom mapping techniques as well as of downstream tasks such as automated synthesis planning.</p><h3>Scientific Contribution</h3><p><span>SynRBL</span> features a novel computational approach to correcting unbalanced entries in chemical reaction databases. By combining heuristic rules for inferring non-carbon compounds and common subgraph searches to address carbon unbalance, <span>SynRBL</span> successfully addresses most instances of this problem, which affects the majority of data in most large-scale resources. Compared to alternative solutions, <span>SynRBL</span> achieves a dramatic increase in both success rate and accurary, and provides the first freely available open source solution for this problem.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00875-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141726657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment Ualign:利用无监督 SMILES 对齐技术突破无模板逆合成预测的极限
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-15 DOI: 10.1186/s13321-024-00877-2
Kaipeng Zeng, Bo Yang, Xin Zhao, Yu Zhang, Fan Nie, Xiaokang Yang, Yaohui Jin, Yanyan Xu
{"title":"Ualign: pushing the limit of template-free retrosynthesis prediction with unsupervised SMILES alignment","authors":"Kaipeng Zeng,&nbsp;Bo Yang,&nbsp;Xin Zhao,&nbsp;Yu Zhang,&nbsp;Fan Nie,&nbsp;Xiaokang Yang,&nbsp;Yaohui Jin,&nbsp;Yanyan Xu","doi":"10.1186/s13321-024-00877-2","DOIUrl":"10.1186/s13321-024-00877-2","url":null,"abstract":"<div><h3>Motivation</h3><p>Retrosynthesis planning poses a formidable challenge in the organic chemical industry, particularly in pharmaceuticals. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels of additional chemical knowledge dependency.</p><h3>Results</h3><p>This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules. Based on the fact that the majority of molecule structures remain unchanged during a chemical reaction, we propose a simple yet effective SMILES alignment technique to facilitate the reuse of unchanged structures for reactant generation. Extensive experiments show that our method substantially outperforms state-of-the-art template-free and semi-template-based approaches. Importantly, our template-free method achieves effectiveness comparable to, or even surpasses, established powerful template-based methods.</p><h3>Scientific contribution</h3><p>We present a novel graph-to-sequence template-free retrosynthesis prediction pipeline that overcomes the limitations of Transformer-based methods in molecular representation learning and insufficient utilization of chemical information. We propose an unsupervised learning mechanism for establishing product-atom correspondence with reactant SMILES tokens, achieving even better results than supervised SMILES alignment methods. Extensive experiments demonstrate that UAlign significantly outperforms state-of-the-art template-free methods and rivals or surpasses template-based approaches, with up to 5% (top-5) and 5.4% (top-10) increased accuracy over the strongest baseline.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00877-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141618323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification LVPocket:通过蛋白质结构分类的迁移学习,综合三维全局-局部信息预测蛋白质结合口袋。
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2024-07-07 DOI: 10.1186/s13321-024-00871-8
Ruifeng Zhou, Jing Fan, Sishu Li, Wenjie Zeng, Yilun Chen, Xiaoshan Zheng, Hongyang Chen, Jun Liao
{"title":"LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification","authors":"Ruifeng Zhou,&nbsp;Jing Fan,&nbsp;Sishu Li,&nbsp;Wenjie Zeng,&nbsp;Yilun Chen,&nbsp;Xiaoshan Zheng,&nbsp;Hongyang Chen,&nbsp;Jun Liao","doi":"10.1186/s13321-024-00871-8","DOIUrl":"10.1186/s13321-024-00871-8","url":null,"abstract":"<div><h3>Background</h3><p>Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally exhibit varying biological functions, whereas those within the same structural class share similar functional attributes.</p><h3>Results</h3><p>We proposed LVPocket, a novel method that synergistically captures both local and global information of protein structure through the integration of Transformer encoders, which help the model achieve better performance in binding pockets prediction. And then we tailored prediction models for data of four distinct structural classes of proteins using the transfer learning. The four fine-tuned models were trained on the baseline LVPocket model which was trained on the sc-PDB dataset. LVPocket exhibits superior performance on three independent datasets compared to current state-of-the-art methods. Additionally, the fine-tuned model outperforms the baseline model in terms of performance.</p><h3>Scientific contribution</h3><p>We present a novel model structure for predicting protein binding pockets that provides a solution for relying on extensive convolutional computation while neglecting global information about protein structures. Furthermore, we tackle the impact of different protein folding structures on binding pocket prediction tasks through the application of transfer learning methods.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00871-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141553971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信