Journal of Cheminformatics最新文献

筛选
英文 中文
A 3D generation framework using diffusion model and reinforcement learning to generate multi-target compounds with desired properties 使用扩散模型和强化学习生成具有所需属性的多目标化合物的3D生成框架
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2025-06-04 DOI: 10.1186/s13321-025-01035-y
Yongna Yuan, Xiaohang Pan, Xiaohong Li, Ruisheng Zhang, Wei Su
{"title":"A 3D generation framework using diffusion model and reinforcement learning to generate multi-target compounds with desired properties","authors":"Yongna Yuan, Xiaohang Pan, Xiaohong Li, Ruisheng Zhang, Wei Su","doi":"10.1186/s13321-025-01035-y","DOIUrl":"https://doi.org/10.1186/s13321-025-01035-y","url":null,"abstract":"Deep generative models provide a powerful solution for the de novo design of molecules. However, the majority of existing methods only generate molecules for a single target. Generating molecules with biological activities against multiple specific targets and desired properties remains an extremely difficult challenge. In this study, we propose a novel 3D molecule generation framework based on reinforcement learning and diffusion model to generate molecules with predefined properties for given multiple targets. The proposed framework, MDRL, uses a diffusion model to understand the 3D chemical structure of molecules and employs Kolmogorov-Arnold Networks instead of Multilayer Perceptron to enhance model performance. Through reinforcement learning, the framework is able to generate molecules that simultaneously target two targets and further optimizes multiple molecular properties. Experimental results show that our model exhibits comparable performance to various state-of-the-art molecular generation models, and MDRL can effectively navigate chemical space to design polypharmacological compounds and control multiple molecular properties. In multiple case studies, we verify that the generated molecules can simultaneously target two targets through molecular docking and assess the model’s ability to control multiple molecular properties. The results in this study highlight the advantages and practicalities of our model in generating polypharmacological compounds with desired properties. This study introduces MDRL, a 3D molecular generation framework integrating diffusion models and reinforcement learning for joint optimization of multi-target binding and molecular properties. MDRL shows improvements over existing methods in controlling drug-relevant properties and enhancing multi-target affinity. Experimental results demonstrate that MDRL efficiently generates drug-like compounds with robust polypharmacological profiles, offering a novel strategy for multi-target drug design.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144211377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RLSuccSite: succinylation sites prediction based on reinforcement learning dynamic with balanced reward mechanism and three-peaks enhanced method for physicochemical property scores RLSuccSite:基于平衡奖励机制的强化学习动态琥珀酰化位点预测和理化性质评分三峰增强法
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2025-06-02 DOI: 10.1186/s13321-025-01034-z
Lun Zhu, Qingchao Zhang, Sen Yang
{"title":"RLSuccSite: succinylation sites prediction based on reinforcement learning dynamic with balanced reward mechanism and three-peaks enhanced method for physicochemical property scores","authors":"Lun Zhu, Qingchao Zhang, Sen Yang","doi":"10.1186/s13321-025-01034-z","DOIUrl":"https://doi.org/10.1186/s13321-025-01034-z","url":null,"abstract":"Recent progress in computational biology has driven the development of machine learning models for predicting protein post-translational modification sites. However, challenges such as data imbalance and limited sequence-context representation continue to hinder prediction accuracy, particularly for less frequent modifications like succinylation. In this study, we propose RLSuccSite, a reinforcement learning-based framework specifically designed to predict succinylation sites by addressing the class imbalance issue via a dynamic with balanced reward mechanism. To enhance sequence feature representation, this study also introduces Three-Peaks Enhanced Method for Physicochemical Property Scores (TPEM-PPS), a physicochemical property-driven feature extraction method that incorporates position-aware scoring to reflect amino acid contributions more effectively. The code and data of RLSuccSite can be obtained from the website: https://github.com/Zhangqingchao-Ch/RLSuccSite.git . Scientific contribution This study applies reinforcement learning to protein succinylation sites prediction, introducing a dynamic with balanced reward mechanism that effectively addresses dataset imbalance. Additionally, this study proposes a novel Three-Peaks Enhanced Method for Physicochemical Scoring, which captures residue contributions with higher precision than traditional feature extraction techniques. ","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"9 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144193336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation of chemistry transport models simulations using knowledge graphs 用知识图表示化学输运模型的模拟
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-31 DOI: 10.1186/s13321-025-01025-0
Eduardo Illueca Fernández, Antonio Jesús Jara Valera, Jesualdo Tomás Fernández Breis
{"title":"Representation of chemistry transport models simulations using knowledge graphs","authors":"Eduardo Illueca Fernández, Antonio Jesús Jara Valera, Jesualdo Tomás Fernández Breis","doi":"10.1186/s13321-025-01025-0","DOIUrl":"https://doi.org/10.1186/s13321-025-01025-0","url":null,"abstract":"Persistent air quality pollution poses a serious threat to human health, and is one of the action points that policy makers should monitor according to the Directive 2008/50/EC. While deploying a massive network of hyperlocal sensors could provide extensive monitoring, this approach cannot generate geospatial continuous data and present several challenges in terms of logistics. Thus, developing accurate and trustable expert systems based on chemistry transport models is a key strategy for environmental protection. However, chemistry transport models present an important lack of standardization, and the formats are not interoperable between different systems, which limits the use for different stakeholders. In this context, semantic technologies provide methods and standards for scientific data and make information readable for expert systems. Therefore, this paper proposes a novel methodology for an ontology driven transformation for CHIMERE simulations, a chemistry transport model, allowing to generate knowledge graphs representing air quality information. It enables the transformation of netCDF files into RDF triples for short term air quality forecasting. Concretely, we utilize the Semantic Web Integration Tool (SWIT) framework for mapping individuals using an ontology as a template. Then, a new ontology for CHIMERE has been defined in this work, reusing concepts for other standards in the state of the art. Our approach demonstrates that RDF files can be created from netCDF in a linear computational time, allowing the scalability for expert systems. In addition, the ontology complains with the OQuaRE quality metrics and can be extended in future extensions to be applied to other chemistry transport models. Development of the first ontology for a chemistry transport model. FAIRification of physical models thanks to the generation of knowledge graphs from netCDF files. The ontology proposed is published in PURL ( https://purl.org/chimere-ontology ) and the knowledge graph generated for a 72-h simulation can be accessed in the following repository: https://doi.org/10.5281/zenodo.13981544 .","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"3 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Higher education in chemoinformatics: achievements and challenges 化学信息学高等教育:成就与挑战
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-31 DOI: 10.1186/s13321-025-01036-x
Alexandre Varnek, Gilles Marcou, Dragos Horvath
{"title":"Higher education in chemoinformatics: achievements and challenges","authors":"Alexandre Varnek, Gilles Marcou, Dragos Horvath","doi":"10.1186/s13321-025-01036-x","DOIUrl":"https://doi.org/10.1186/s13321-025-01036-x","url":null,"abstract":"While chemoinformatics is a well-established scientific field, its integration into university curricula is rarely discussed. In this work, we share our experience in developing a chemoinformatics curriculum at the University of Strasbourg and highlight the main challenges in higher education for this discipline.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"28 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equivariant diffusion for structure-based de novo ligand generation with latent-conditioning 基于结构的具有潜在调节的新配体生成的等变扩散
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-31 DOI: 10.1186/s13321-025-01028-x
Tuan Le, Julian Cremer, Djork-Arné Clevert, Kristof T. Schütt
{"title":"Equivariant diffusion for structure-based de novo ligand generation with latent-conditioning","authors":"Tuan Le, Julian Cremer, Djork-Arné Clevert, Kristof T. Schütt","doi":"10.1186/s13321-025-01028-x","DOIUrl":"https://doi.org/10.1186/s13321-025-01028-x","url":null,"abstract":"We introduce PoLiGenX, a novel generative model for de novo ligand design that employs latent-conditioned, target-aware equivariant diffusion. Our approach leverages the conditioning of the ligand generation process on reference molecules located within a specific protein pocket. By doing so, PoLiGenX generates shape-similar ligands that are adapted to the target pocket, enabling effective applications in target-aware hit expansion and hit optimization. Our experimental results underscore the efficacy of PoLiGenX in advancing ligand design. Notably, docking analyses reveal that the ligands generated by PoLiGenX show enhanced binding affinities relative to their reference molecules, all while retaining a similar molecular shape, but also retaining better poses with lower strain energies and less steric clashes. Furthermore, the model promotes substantial chemical diversity, facilitating the exploration of broader and more varied chemical spaces. Importantly, the generated ligands were assessed for drug-likeness using Lipinski’s rule of five, demonstrating superior adherence to drug-likeness criteria compared to the reference dataset. This work represents a step forward in the controlled and precise generation of therapeutically relevant de novo ligands tailored for specific protein targets, contributing to progress in computational drug discovery and ligand design. We present a latent-conditioning method within diffusion models to enable the controllable generation of ligands in structure-based drug design that are similar to a reference ligand. We show that the generated ligands obtained via latent-conditioning achieve favorable ligand poses with reduced steric clashes and lower strain energies compared to diffusion models that only condition on the protein pocket alone. We demonstrate that the ligand generation can be further constrained using an importance sampling algorithm with external surrogate models that account for molecular properties such as synthetic accessibility.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"7 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised prediction of protein fitness for data-driven protein engineering 数据驱动蛋白质工程中蛋白质适应度的半监督预测
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-31 DOI: 10.1186/s13321-025-01029-w
Alicia Olivares-Gil, José A. Barbero-Aparicio, Juan J. Rodríguez, José F. Díez-Pastor, César García-Osorio, Mehdi D. Davari
{"title":"Semi-supervised prediction of protein fitness for data-driven protein engineering","authors":"Alicia Olivares-Gil, José A. Barbero-Aparicio, Juan J. Rodríguez, José F. Díez-Pastor, César García-Osorio, Mehdi D. Davari","doi":"10.1186/s13321-025-01029-w","DOIUrl":"https://doi.org/10.1186/s13321-025-01029-w","url":null,"abstract":"Protein fitness prediction plays a crucial role in the advancement of protein engineering endeavours. However, the combinatorial complexity of the protein sequence space and the limited availability of assay-labelled data hinder the efficient optimization of protein properties. Data-driven strategies utilizing machine learning methods have emerged as a promising solution, yet their dependence on labelled training datasets poses a significant obstacle. To overcome this challenge, in this work, we explore various ways of introducing the latent information present in evolutionarily related sequences (homologous sequences) into the training process. To do so, we establish several strategies based on semi-supervised learning (unsupervised pre-processing and wrapper methods) and perform a comprehensive comparison using 19 datasets containing protein-fitness pairs. Our findings reveal that using the information present in the homologous sequences can improve the performance of the models, especially when the number of available labelled sequences is considerably low. Specifically, the combination of a sequence encoding method based on Direct Coupling Analysis (DCA), with MERGE (a hybrid regression framework that combines evolutionary information with supervised learning) and an SVM regressor, outperforms other encodings (PAM250, UniRep, eUniRep) and other semi-supervised wrapper methods (Tri-Training Regressor, Co-Training Regressor). In summary, the demonstrated performance gains of this strategy mark a substantial leap towards more robust and reliable predictive models for protein engineering tasks. This advancement holds the potential to streamline the design and optimisation of proteins for diverse applications in biotechnology and therapeutics. We explore several semi-supervised learning strategies capable of including the homologous sequences (unlabelled) to the protein of interest in the training process. Among them, we present two new methods to exploit the information in the homologous sequences: i) a new generalised version of MERGE capable of employing any regressor as a base estimator; ii) the Tri-Training Regressor method, an adaptation of the Tri-Training method for regression problems. We find that the information inherent in the homologous sequences has the ability to improve the predictive capacity of models when the number of available sequences is scarce, especially when using the DCA encoding together with MERGE and an SVM regressor.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"3 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing atom mapping with multitask learning and symmetry-aware deep graph matching 用多任务学习和对称感知深度图匹配增强原子映射
IF 8.6 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-30 DOI: 10.1186/s13321-025-01030-3
Maryam Astero, Juho Rousu
{"title":"Enhancing atom mapping with multitask learning and symmetry-aware deep graph matching","authors":"Maryam Astero, Juho Rousu","doi":"10.1186/s13321-025-01030-3","DOIUrl":"https://doi.org/10.1186/s13321-025-01030-3","url":null,"abstract":"Atom mapping involves identifying the correspondence between individual atoms in reactant molecules and their counterparts in product molecules. This process is crucial for gaining deeper insight into reaction mechanisms, such as defining reaction templates and determining which chemical bonds are formed or broken during a reaction. However, reliable atom mapping data are often limited or incomplete within chemical databases, rendering manual annotation impractical for large-scale datasets. To address this limitation, we propose the Symmetry-Aware Multitask Atom Mapping Network (SAMMNet), a model designed to automatically infer atom correspondences by incorporating an auxiliary self-supervised task during training. SAMMNet employs molecular graph representations and leverages graph neural networks to capture both general and task-specific features, enabling enhanced predictive performance. Our experimental results demonstrate that the multitask learning framework, coupled with symmetry-aware atom mapping, improves accuracy and robustness in atom mapping predictions. This makes our method a promising advancement for computational chemistry and related fields. This study introduces SAMMNet, a novel Symmetry-Aware Multitask Atom Mapping Network, advancing atom mapping methodologies by integrating multitask learning and post-prediction symmetry refinement. Unlike prior approaches, SAMMNet leverages auxiliary self-supervised tasks to enhance molecular graph representations, improving mapping accuracy while addressing imbalanced reactions through graph padding techniques.","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"68 1","pages":""},"PeriodicalIF":8.6,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144176534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemical characteristics vectors map the chemical space of natural biomes from untargeted mass spectrometry data 化学特征向量从非目标质谱数据映射自然生物群系的化学空间
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-26 DOI: 10.1186/s13321-025-01031-2
Pilleriin Peets, Aristeidis Litos, Kai Dührkop, Daniel R. Garza, Justin J. J. van der Hooft, Sebastian Böcker, Bas E. Dutilh
{"title":"Chemical characteristics vectors map the chemical space of natural biomes from untargeted mass spectrometry data","authors":"Pilleriin Peets,&nbsp;Aristeidis Litos,&nbsp;Kai Dührkop,&nbsp;Daniel R. Garza,&nbsp;Justin J. J. van der Hooft,&nbsp;Sebastian Böcker,&nbsp;Bas E. Dutilh","doi":"10.1186/s13321-025-01031-2","DOIUrl":"10.1186/s13321-025-01031-2","url":null,"abstract":"<div><p>Untargeted metabolomics can comprehensively map the chemical space of a biome, but is limited by low annotation rates (&lt; 10%). We used chemical characteristics vectors, consisting of molecular fingerprints or chemical compound classes, predicted from mass spectrometry data, to characterize compounds and samples. These chemical characteristics vectors (CCVs) estimate the fraction of compounds with specific chemical properties in a sample. Unlike the aligned MS1 data with intensity information, CCVs incorporate the chemical properties of compounds, allowing chemical annotation to be used for sample comparison. Thus, we identified compound classes differentiating biomes, such as ethers which are enriched in environmental biomes, while steroids enriched in animal host-related biomes. In biomes with greater variability, CCVs revealed key clustering compound classes, such as organonitrogen compounds in animal distal gut and lipids in animal secretions. CCVs thus enhance the interpretation of untargeted metabolomic data, providing a quantifiable and generalizable understanding of the chemical space of natural biomes.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01031-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144136977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Moldrug algorithm for an automated ligand binding site exploration by 3D aware molecular enumerations 基于三维感知分子枚举的配体结合位点自动探测的Moldrug算法
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-26 DOI: 10.1186/s13321-025-01022-3
Alejandro Martínez León, Benjamin Ries, Jochen S. Hub, Aniket Magarkar
{"title":"Moldrug algorithm for an automated ligand binding site exploration by 3D aware molecular enumerations","authors":"Alejandro Martínez León,&nbsp;Benjamin Ries,&nbsp;Jochen S. Hub,&nbsp;Aniket Magarkar","doi":"10.1186/s13321-025-01022-3","DOIUrl":"10.1186/s13321-025-01022-3","url":null,"abstract":"<div><p>We present Moldrug, a computational tool for accelerating the hit-to-lead phase in structure-based drug design. Moldrug explores the chemical space using structural modifications suggested by the CReM library and by optimizing an adaptable fitness function with a genetic algorithm. Moldrug is complemented by Moldrug-Dashboard, a cross-platform and user-friendly graphical interface tailored for the analysis of Moldrug simulations. To illustrate Moldrug, we designed new potential inhibitors targeting the main protease (M<sup>Pro</sup>) of SARS-CoV-2 by optimizing a consensus fitness function that balances binding affinity, drug-likeness, and synthetic accessibility. The designed molecules exhibited high chemical diversity. A subset of the designed molecules were ranked using MM/GBSA and alchemical binding free energy calculations, revealing predicted affinities as low as <span>(-10,~hbox {kcal},hbox {mol}^{-1})</span>. Moldrug is distributed as a Python package under the Apache 2.0 license. It offers pre-configured multi-parameter fitness functions for molecular design, while being highly adaptable for integrating functionalities from external software. Documentation and tutorials are available at https://moldrug.rtfd.io.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01022-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144136972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing molecular property prediction with quantized GNN models 利用量子化GNN模型增强分子性质预测
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-05-26 DOI: 10.1186/s13321-025-00989-3
Areen Rasool, Jamshaid Ul Rahman, Rongin Uwitije
{"title":"Enhancing molecular property prediction with quantized GNN models","authors":"Areen Rasool,&nbsp;Jamshaid Ul Rahman,&nbsp;Rongin Uwitije","doi":"10.1186/s13321-025-00989-3","DOIUrl":"10.1186/s13321-025-00989-3","url":null,"abstract":"<div><p>Efficient and reliable prediction of molecular properties, such as water solubility, hydration free energy, lipophilicity, and quantum mechanical properties, is essential for rational compound design in the chemical and pharmaceutical industries. While Graph Neural Networks (GNNs) have significantly advanced molecular property prediction tasks, their high memory footprint, computational demands, and inference latency are often overlooked. These challenges hinder the deployment of property prediction models on resource-constrained devices such as smartphones and IoT devices. Therefore, optimizing storage, reducing resource consumption, and improving inference speed are crucial. This paper presents a systematic approach to molecular networks by integrating GNN models with the DoReFa-Net quantization algorithm. The proposed method aims to enhance computational efficiency while maintaining predictive performance, enabling lightweight yet effective models suitable for molecular task. The study investigates the impact of different bitwidth quantization levels on model performance, using metrics such as RMSE and MAE. Results show that, for physical chemistry datasets, the effectiveness of quantization is highly dependent on the model architecture. Notably, the quantum mechanical dipole moment task maintains strong performance up to 8-bit precision, achieving similar or slightly better results. However, extreme quantization, particularly at 2-bit precision, severely degrades performance, highlighting the limitations of aggressive compression.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00989-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144136978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信