{"title":"Revealing the Impact of Aggregations in the Graph‐Based Molecular Machine Learning: Electrostatic Interaction Versus Pooling Methods","authors":"Sanghoon Lee, Hyun Woo Kim","doi":"10.1002/adts.202500133","DOIUrl":null,"url":null,"abstract":"Molecular structures that can be readily represented by graphs comprising constituent atoms (nodes) and their chemical bonds (edges) can also be used as input data for well‐known machine learning (ML) models that process this data, such as graph neural networks (GNNs). GNNs have shown a reasonable performance in the predicting properties of chemical systems. In typical applications of GNNs to chemistry‐related fields, the main objective is to create an optimal molecular representation by aggregating atomic features and pooling features in the graph. In this study, two different approaches are investigated that can possibly generate better molecular representations. First, intermolecular edges are created to predict the photochemical properties of chromophore molecules in the solution. These intermolecular edges are constructed using atomic partial charges, inspired from the fact that electrostatic interaction is the main component of solute‐solvent interaction. In the second approach, the effect of the aggregation and pooling functions is investigated. The results show that intermolecular electrostatic edges based on ground state charges prevent the GNN model from generating more effective molecular representations. On the contrary, the model demonstrated better performance when the averaging and adding operations are employed in a hybrid manner for the aggregation and pooling functions.","PeriodicalId":7219,"journal":{"name":"Advanced Theory and Simulations","volume":"5 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Theory and Simulations","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1002/adts.202500133","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Molecular structures that can be readily represented by graphs comprising constituent atoms (nodes) and their chemical bonds (edges) can also be used as input data for well‐known machine learning (ML) models that process this data, such as graph neural networks (GNNs). GNNs have shown a reasonable performance in the predicting properties of chemical systems. In typical applications of GNNs to chemistry‐related fields, the main objective is to create an optimal molecular representation by aggregating atomic features and pooling features in the graph. In this study, two different approaches are investigated that can possibly generate better molecular representations. First, intermolecular edges are created to predict the photochemical properties of chromophore molecules in the solution. These intermolecular edges are constructed using atomic partial charges, inspired from the fact that electrostatic interaction is the main component of solute‐solvent interaction. In the second approach, the effect of the aggregation and pooling functions is investigated. The results show that intermolecular electrostatic edges based on ground state charges prevent the GNN model from generating more effective molecular representations. On the contrary, the model demonstrated better performance when the averaging and adding operations are employed in a hybrid manner for the aggregation and pooling functions.
期刊介绍:
Advanced Theory and Simulations is an interdisciplinary, international, English-language journal that publishes high-quality scientific results focusing on the development and application of theoretical methods, modeling and simulation approaches in all natural science and medicine areas, including:
materials, chemistry, condensed matter physics
engineering, energy
life science, biology, medicine
atmospheric/environmental science, climate science
planetary science, astronomy, cosmology
method development, numerical methods, statistics