Journal of Cheminformatics最新文献

筛选
英文 中文
Cheminformatics Microservice V3: a web portal for chemical structure manipulation and analysis 化学信息学微服务V3:用于化学结构操作和分析的门户网站
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-09-23 DOI: 10.1186/s13321-025-01094-1
Kohulan Rajan, Venkata Chandrasekhar, Nisha Sharma, Sri Ram Sagar Kanakam, Felix Baensch, Christoph Steinbeck
{"title":"Cheminformatics Microservice V3: a web portal for chemical structure manipulation and analysis","authors":"Kohulan Rajan,&nbsp;Venkata Chandrasekhar,&nbsp;Nisha Sharma,&nbsp;Sri Ram Sagar Kanakam,&nbsp;Felix Baensch,&nbsp;Christoph Steinbeck","doi":"10.1186/s13321-025-01094-1","DOIUrl":"10.1186/s13321-025-01094-1","url":null,"abstract":"<div><p>The widespread adoption of open-source cheminformatics toolkits remains constrained by technical implementation barriers, including complex installation procedures, dependency management, and integration challenges. Here, we present <i>Cheminformatics Microservice V3</i>, a significant update to the existing platform that provides unified programmatic access to cheminformatics libraries, including RDKit, Chemistry Development Kit (CDK), and Open Babel through a RESTful API framework. This latest version features a newly developed, interactive web-based frontend built with React, providing users with an intuitive graphical interface for manipulating and analysing chemical structures. The frontend supports essential cheminformatics operations, including structure editing, PubChem database integration, batch molecular processing, and standardised InChI/RInChI identifier generation. The microservice V3 addresses critical accessibility barriers in computational chemistry by providing researchers with immediate access to analytical tools, eliminating the need for specialised technical expertise or complex software installations. This approach facilitates reproducible research workflows and broadens the utilisation of cheminformatics methodologies across interdisciplinary research communities. The platform is publicly accessible at https://app.naturalproducts.net, and the complete source code and documentation are available on GitHub.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01094-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive landscape of AI applications in broad-spectrum drug interaction prediction: a systematic review 人工智能在广谱药物相互作用预测中的应用:系统综述
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-09-19 DOI: 10.1186/s13321-025-01093-2
Nour H. Marzouk, Sahar Selim, Mustafa Elattar, Mai S. Mabrouk, Mohamed Mysara
{"title":"A comprehensive landscape of AI applications in broad-spectrum drug interaction prediction: a systematic review","authors":"Nour H. Marzouk,&nbsp;Sahar Selim,&nbsp;Mustafa Elattar,&nbsp;Mai S. Mabrouk,&nbsp;Mohamed Mysara","doi":"10.1186/s13321-025-01093-2","DOIUrl":"10.1186/s13321-025-01093-2","url":null,"abstract":"<div><p>In drug development, managing interactions such as drug–drug, drug–disease, and drug–nutrient is critical for ensuring the safety and efficacy of pharmacological treatments. These interactions often overlap, forming a complex, interconnected landscape that necessitates accurate prediction to improve patient outcomes and support evidence-based care. Recent advances in artificial intelligence (AI), powered by large-scale datasets (e.g., DrugBank, TWOSIDES, SIDER), have significantly enhanced interaction prediction. Machine learning, deep learning, and graph-based models show great promise, but challenges persist, including data imbalance, noisy sources, Limited explainability, and underrepresentation of certain types of interactions. This systematic review of 147 studies (2018–2024) is the first to comprehensively map AI applications across major interaction types. We present a detailed taxonomy of models and datasets, emphasizing the growing roles of large language models and knowledge graphs in overcoming key limitations. Their integration—alongside explainable AI tools—enhances transparency, paving the way for AI-driven systems that proactively mitigate adverse interactions. By identifying the most promising approaches and critical research gaps, this review lays the groundwork for advancing more robust, interpretable, and personalized models for drug interaction prediction.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01093-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145079055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaboGNN: predicting liver metabolic stability with graph neural networks and cross-species data MetaboGNN:用图神经网络和跨物种数据预测肝脏代谢稳定性
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-09-03 DOI: 10.1186/s13321-025-01089-y
Jun Hyeong Park, Ri Han, Junbo Jang, Jisan Kim, Joonki Paik, Jaesung Heo, Yoonji Lee
{"title":"MetaboGNN: predicting liver metabolic stability with graph neural networks and cross-species data","authors":"Jun Hyeong Park,&nbsp;Ri Han,&nbsp;Junbo Jang,&nbsp;Jisan Kim,&nbsp;Joonki Paik,&nbsp;Jaesung Heo,&nbsp;Yoonji Lee","doi":"10.1186/s13321-025-01089-y","DOIUrl":"10.1186/s13321-025-01089-y","url":null,"abstract":"<div><p>The metabolic stability of a drug is a crucial determinant of its pharmacokinetic properties, including clearance, half-life, and oral bioavailability. Accurate predictions of metabolic stability can significantly streamline the drug discovery process. In this study, we present <i>MetaboGNN</i>, an advanced model for predicting liver metabolic stability based on Graph Neural Networks (GNNs) and Graph Contrastive Learning (GCL). Using a high-quality dataset from the 2023 South Korea Data Challenge for Drug Discovery, which comprises 3,498 training molecules and 483 test molecules, we presented molecular structures as graphs to capture the intricate structural relationships that influence metabolic stability. A GCL-driven pretraining step was employed to enhance model generalizability by learning robust, transferable graph-level representations. Notably, incorporating interspecies differences between human liver microsomes (HLM) and mouse liver microsomes (MLM) further improved predictive accuracy, achieving Root Mean Square Error (RMSE) values of 27.91 (HLM) and 27.86 (MLM), both expressed as the percentage of parent compound remaining after a 30-min incubation. Compared to traditional approaches, <i>MetaboGNN</i> demonstrates superior predictive performance and highlights the importance of considering interspecies enzymatic variations. In addition, attention-based analysis identified key molecular fragments associated with metabolic stability, highlighting chemically meaningful structural determinants. These findings establish <i>MetaboGNN</i> as a powerful tool for metabolic stability prediction, supporting more efficient lead optimization processes in drug discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01089-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The first South Korean data challenge for drug discovery using human and mouse liver microsomal stability data 韩国首个利用人和小鼠肝微粒体稳定性数据进行药物发现的数据挑战
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-09-03 DOI: 10.1186/s13321-025-01047-8
Nam-Chul Cho, SeongEun Hong, Jin Sook Song, EuiJu Yeo, SoI Jung, Yuno Lee, Seul Gee Hwang, Su Min Kang, JaeSung Hwang, Tae-Eun Jin
{"title":"The first South Korean data challenge for drug discovery using human and mouse liver microsomal stability data","authors":"Nam-Chul Cho,&nbsp;SeongEun Hong,&nbsp;Jin Sook Song,&nbsp;EuiJu Yeo,&nbsp;SoI Jung,&nbsp;Yuno Lee,&nbsp;Seul Gee Hwang,&nbsp;Su Min Kang,&nbsp;JaeSung Hwang,&nbsp;Tae-Eun Jin","doi":"10.1186/s13321-025-01047-8","DOIUrl":"10.1186/s13321-025-01047-8","url":null,"abstract":"<div><p>The Korea Chemical Bank (KCB) has generated a dataset containing metabolic stability data for approximately 4,000 compounds that have been tested on human and mouse liver microsomes. The first South Korea Data Challenge, named the Jump AI Challenge for Drug Discovery (JUMP AI 2023), was opened using the metabolic stability data of KCB in 2023. The objective of the JUMP AI 2023 was to promote and encourage the development of new drugs using artificial intelligence (AI) technology in South Korea. A total of 1254 teams participated in the competition, developing algorithms to estimate the remaining percentage of compounds after 30 min of incubation with human and mouse liver microsomes. The data set comprised training and test sets of 3498 and 483 compounds, respectively. This paper provides an overview of the JUMP AI 2023 and its outcomes, highlighting the diverse range of algorithms and artificial intelligence technologies employed by the competing teams. Among these, five teams stood out by utilizing GNN-based approaches winning awards. This competition was the first AI competition for drug discovery in South Korea, attracting numerous researchers and playing a key role in promoting drug research through the application of artificial intelligence technologies.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01047-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144934591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Box embeddings for extending ontologies: a data-driven and interpretable approach 用于扩展本体的框嵌入:一种数据驱动和可解释的方法
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-09-01 DOI: 10.1186/s13321-025-01086-1
Adel Memariani, Martin Glauer, Simon Flügel, Fabian Neuhaus, Janna Hastings, Till Mossakowski
{"title":"Box embeddings for extending ontologies: a data-driven and interpretable approach","authors":"Adel Memariani,&nbsp;Martin Glauer,&nbsp;Simon Flügel,&nbsp;Fabian Neuhaus,&nbsp;Janna Hastings,&nbsp;Till Mossakowski","doi":"10.1186/s13321-025-01086-1","DOIUrl":"10.1186/s13321-025-01086-1","url":null,"abstract":"<p>Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model’s outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process.</p><p>Our proposed approach advances chemical classification by enabling\u0000interpretable outputs through a structured and geometrically\u0000expressive representation of molecules and their classes.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01086-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144924125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of chirality descriptors derived from SMILES heteroencoders 由SMILES异质编码器衍生的手性描述符的评估
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-31 DOI: 10.1186/s13321-025-01080-7
Natalia Baimacheva, Xinyue Gao, Joao Aires-de-Sousa
{"title":"Evaluation of chirality descriptors derived from SMILES heteroencoders","authors":"Natalia Baimacheva,&nbsp;Xinyue Gao,&nbsp;Joao Aires-de-Sousa","doi":"10.1186/s13321-025-01080-7","DOIUrl":"10.1186/s13321-025-01080-7","url":null,"abstract":"<div><p>Molecular representations of chirality, derived from latent space vectors (LSVs) of SMILES heteroencoders, were explored to train machine learning models to predict chiral properties, and were compared to conventional circular fingerprints. Latent space arithmetic was applied to enhance the representation of chirality, by calculating differences between the original descriptor of a molecule and the descriptor of its enantiomer, or the difference between the original descriptor and the descriptor obtained with the stereochemistry-depleted SMILES string. Machine learning was performed with the Random Forest algorithm applied to a dataset of 3858 molecules extracted from the literature (1929 pairs of enantiomers) to predict the elution order observed on the Chiralpak® AD-H column, as well as intrinsic structural chirality labels (R/S or canonical SMILES @/@@). The descriptors derived from the heteroencoders achieved an accuracy of up to 0.75 in the prediction of the elution order, and the fingerprints were superior (0.82). A better predictive ability was observed with the difference LSV descriptors than with the original descriptors.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01080-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144920688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alphappimi: a comprehensive deep learning framework for predicting PPI-modulator interactions Alphappimi:一个全面的深度学习框架,用于预测ppi -调制器相互作用
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-29 DOI: 10.1186/s13321-025-01077-2
Dayan Liu, Tao Song, Shuang Wang, Xue Li, Peifu Han, Jianmin Wang, Shudong Wang
{"title":"Alphappimi: a comprehensive deep learning framework for predicting PPI-modulator interactions","authors":"Dayan Liu,&nbsp;Tao Song,&nbsp;Shuang Wang,&nbsp;Xue Li,&nbsp;Peifu Han,&nbsp;Jianmin Wang,&nbsp;Shudong Wang","doi":"10.1186/s13321-025-01077-2","DOIUrl":"10.1186/s13321-025-01077-2","url":null,"abstract":"<p>Protein-protein interactions (PPIs) regulate essential biological processes through complex interfaces, with their dysfunction is associated with various diseases. Consequently, the identification of PPIs and their interface-targeting modulators has emerged as a critical therapeutic approach. However, discovering modulators that target PPIs and PPI interfaces remains challenging as traditional structure-similarity-based methods fail to effectively characterize PPI targets, particularly those for which no active compounds are known. Here, we present AlphaPPIMI, a comprehensive deep learning framework that combines large-scale pretrained language models with domain adaptation for predicting PPI-modulator interactions, specifically targeting PPI interface. To enable robust model development and evaluation, we constructed comprehensive benchmark datasets of PPI-modulator interactions (PPIMI). Our framework integrates comprehensive molecular features from Uni-Mol2, protein representations derived from state-of-the-art language models (ESM2 and ProTrans), and PPI structural characteristics encoded by PFeature. Through a specialized cross-attention architecture and conditional domain adversarial networks (CDAN), AlphaPPIMI effectively learns potential associations between PPI targets and modulators while ensuring robust cross-domain generalization. Extensive evaluations indicate that AlphaPPIMI achieves consistently improved performance over existing methods in PPIMI prediction, offering a promising approach for prioritizing candidate PPI modulators, particularly those targeting protein–protein interfaces.</p><p>This work presents AlphaPPIMI, a novel deep learning framework for accurately predicting modulators targeting protein-protein interactions (PPIs) and their interfaces. Its core contributions include a specialized cross-attention module for the synergistic fusion of multimodal pretrained representations, and the novel application of a Conditional Domain Adversarial Network (CDAN) to significantly improve generalization across diverse protein families. AlphaPPIMI demonstrates superior performance on curated benchmarks, providing a powerful computational tool for the discovery of targeted PPI therapeutics.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01077-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-powered prediction of critical properties and boiling points: a hybrid ensemble learning and QSPR approach 人工智能驱动的关键性质和沸点预测:混合集成学习和QSPR方法
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-29 DOI: 10.1186/s13321-025-01062-9
Roda Bounaceur, Francisco Paes, Romain Privat, Jean-Noël Jaubert
{"title":"AI-powered prediction of critical properties and boiling points: a hybrid ensemble learning and QSPR approach","authors":"Roda Bounaceur,&nbsp;Francisco Paes,&nbsp;Romain Privat,&nbsp;Jean-Noël Jaubert","doi":"10.1186/s13321-025-01062-9","DOIUrl":"10.1186/s13321-025-01062-9","url":null,"abstract":"<div><p>In this paper, we propose a robust deep-learning model based on a Quantitative Structure − Property Relationship (QSPR) approach for estimating the critical temperature (TC), critical pressure (PC), acentric factor (ACEN) and normal boiling point (NBP) of any C, H, O, N, S, P, F, Cl, Br, I molecule. The Mordred calculator was used to determine 247 descriptors to characterize the molecules considered in this work. For each evaluated property, multiple neural networks were trained within a <i>bagging</i> framework. The predictions from the final ensemble were successfully tested against a large set of experimental data comprising more than 1700 molecules and compared with those from different recent learning models found in the literature. Comprehensive comparisons and extensive testing highlight the robustness and predictive power of the newly proposed multimodal learning model. The developed prediction tool is available on a website at https://lrgp-thermoppt.streamlit.app/. Furthermore, a source code for implementing the trained models in Python is available via github https://github.com/bounac80/AI-ThermPpt.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01062-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biosynfoni: a biosynthesis-informed and interpretable lightweight molecular fingerprint Biosynfoni:生物合成信息和可解释的轻量级分子指纹
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-29 DOI: 10.1186/s13321-025-01081-6
Lucina-May Nollen, David Meijer, Maria Sorokina, Justin J. J. van der Hooft
{"title":"Biosynfoni: a biosynthesis-informed and interpretable lightweight molecular fingerprint","authors":"Lucina-May Nollen,&nbsp;David Meijer,&nbsp;Maria Sorokina,&nbsp;Justin J. J. van der Hooft","doi":"10.1186/s13321-025-01081-6","DOIUrl":"10.1186/s13321-025-01081-6","url":null,"abstract":"<div><p>Natural products provide a rich source of bioactive molecules for a variety of applications. Molecular fingerprints are the tool of choice for systematic large-scale studies of their structures. However, current molecular fingerprints insufficiently represent characteristic features of natural products inherently, decreasing the interpretability of natural product-specific predictions. Here, we show that a natural product-specific molecular fingerprint based on a relatively small set of selected biosynthetic building blocks provides more interpretable predictions of biosynthetic distance and natural product classification. Our fingerprint Biosynfoni outperforms MACCS, Morgan, and Daylight-like fingerprints in biosynthetic distance estimation, using 39 substructure keys. Moreover, Biosynfoni’s design, compactness, and concrete substructure definition allow easy visualisation of the detected substructures and their respective biosynthetic pathway origins. Through Biosynfoni, users can gain more insights from predictions and better examine the importance of features within machine learning models. Our results show that a short fingerprint consisting of biologically significant building blocks performs on par with top-performing molecular fingerprints for natural product classification while improving prediction explainability.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01081-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models FusionCLM:通过化学语言模型的知识融合增强分子性质预测
IF 5.7 2区 化学
Journal of Cheminformatics Pub Date : 2025-08-29 DOI: 10.1186/s13321-025-01073-6
Yutong Lu, Yan Yi Li, Yan Sun, Pingzhao Hu
{"title":"FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models","authors":"Yutong Lu,&nbsp;Yan Yi Li,&nbsp;Yan Sun,&nbsp;Pingzhao Hu","doi":"10.1186/s13321-025-01073-6","DOIUrl":"10.1186/s13321-025-01073-6","url":null,"abstract":"<div><p>Chemical Language Models (CLMs) have demonstrated capabilities in extracting patterns and predicting from vast volume of the Simplified Molecular Input Line Entry System (SMILES), a notation used to represent molecular structures. Different CLMs, developed from various architectures, can provide unique insights into molecular properties. To harness the uniqueness of different CLMs, we propose FusionCLM, a novel stacking-ensemble learning algorithm that integrate the outputs of multiple CLMs into a unified framework. FusionCLM first generates SMILES embeddings, predictions, and losses from each CLM. Auxiliary models are trained on these first-level predictions and embeddings to estimate test losses during inference. The losses and predictions are then concatenated to create an integrated feature matrix, which trains second-level meta-models for final predictions. Empirical testing on five datasets demonstrates that FusionCLM have better performance than individual CLM at the first level and three advanced multimodal deep learning frameworks, showcasing FusionCLM’s potential in advancing molecular property prediction.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01073-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信