{"title":"Graph residual based method for molecular property prediction","authors":"Kanad Sen , Saksham Gupta , Abhishek Raj , Alankar Alankar","doi":"10.1016/j.chemolab.2025.105471","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning-driven methods for chemical property prediction have been of deep interest. However, much work remains to be done to improve the generalization ability, accuracy, and inference time of critical applications. Traditional machine learning models predict properties based on the features extracted from the molecules, which are often not readily available. In this work, a novel deep learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, constituting the training data. This article highlights a detailed description of the novel GRU (Gated Recurrent Unit) - based methodology, ECRGNN, to map the inputs that have been used. Emphasis is placed on highlighting both the regressive property and the classification efficacy of the same. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method used for multi-class multi-label property prediction has also been provided. The results have been compared with standard benchmark datasets and some newly developed datasets. All performance metrics that have been used have been clearly defined, and their reason for choice.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105471"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016974392500156X","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning-driven methods for chemical property prediction have been of deep interest. However, much work remains to be done to improve the generalization ability, accuracy, and inference time of critical applications. Traditional machine learning models predict properties based on the features extracted from the molecules, which are often not readily available. In this work, a novel deep learning method, the Edge Conditioned Residual Graph Neural Network (ECRGNN), has been applied, allowing us to predict properties directly only the Graph-based structures of the molecules. SMILES (Simplified Molecular Input Line Entry System) representation of the molecules has been used in the present study as input data format, which has been further converted into a graph database, constituting the training data. This article highlights a detailed description of the novel GRU (Gated Recurrent Unit) - based methodology, ECRGNN, to map the inputs that have been used. Emphasis is placed on highlighting both the regressive property and the classification efficacy of the same. A detailed description of the Variational Autoencoder (VAE) and the end-to-end learning method used for multi-class multi-label property prediction has also been provided. The results have been compared with standard benchmark datasets and some newly developed datasets. All performance metrics that have been used have been clearly defined, and their reason for choice.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.