{"title":"Accelerated prediction of molecular properties for per- and polyfluoroalkyl substances using graph neural networks with adjacency-free message passing","authors":"Hector Medina, Rachel Drake, Carson Farmer","doi":"10.1016/j.envpol.2025.126705","DOIUrl":null,"url":null,"abstract":"<div><div>The molecular contaminant chemical space is vast, necessitating the development of methods and tools to accelerate the computation of molecular properties, study interactions, and ultimately aid to the engineering of technological solutions for environmental remediation and exposome reduction. Graph neural networks (GNNs) offer a promising approach due to their structural similarity to molecular graphs and their ability to learn complex relationships through graph-based structures. However, GNN-based model training can be computationally expensive, especially for large molecular datasets. In this work, we evaluated the predictive performance of a novel Graph-Enhanced multilayer perceptron (GE-MLP) on molecular properties of per- and polyfluoroalkyl substances (PFAS), and compared it against the performances of two traditional GNN-based architectures, namely Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). The GE-MLP architecture, which incorporates structural information into a dense neural network framework, was trained on and validated on a dataset of 15,000 PFAS, generated using tight-binding methods, and calibrated against experimental results. The targeted properties were electron affinity (EA), ionization potential (IP), and HOMO–LUMO gap (HL). In contrast to traditional graph-based architectures, GE-MLP offers the advantages of processing molecular fingerprints and node-level descriptors in a purely feedforward manner, embedding structural information using molecular fingerprints and node-level descriptors in place of adjacency-based message passing. Our findings reinforce the usefulness of graph-based architectures in predicting molecular properties of complex contaminants such as PFAS, as compared against traditional machine learning (ML) models. Furthermore, the GE-MLP emerged as a strong GNN-based contender, demonstrating the highest predictive performance for IP, suggesting that integrating structural information via atomic and fingerprint based molecular descriptors into dense neural networks offers a viable alternative to adjacency-based message passing methods. Finally, our GE-MLP provides a computationally efficient alternative to other GNN-based methods due to savings in model training, offering a scalable, message-passing-free approach to molecular property prediction while retaining structural awareness. Future work includes the expansion of the data set to 3.5 million fluorinated compounds to improve generalization, as well as architectural improvements that include transfer learning, topological embeddings, and hybrid models to further advance predictive accuracy and applicability.</div></div>","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"382 ","pages":"Article 126705"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0269749125010784","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The molecular contaminant chemical space is vast, necessitating the development of methods and tools to accelerate the computation of molecular properties, study interactions, and ultimately aid to the engineering of technological solutions for environmental remediation and exposome reduction. Graph neural networks (GNNs) offer a promising approach due to their structural similarity to molecular graphs and their ability to learn complex relationships through graph-based structures. However, GNN-based model training can be computationally expensive, especially for large molecular datasets. In this work, we evaluated the predictive performance of a novel Graph-Enhanced multilayer perceptron (GE-MLP) on molecular properties of per- and polyfluoroalkyl substances (PFAS), and compared it against the performances of two traditional GNN-based architectures, namely Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). The GE-MLP architecture, which incorporates structural information into a dense neural network framework, was trained on and validated on a dataset of 15,000 PFAS, generated using tight-binding methods, and calibrated against experimental results. The targeted properties were electron affinity (EA), ionization potential (IP), and HOMO–LUMO gap (HL). In contrast to traditional graph-based architectures, GE-MLP offers the advantages of processing molecular fingerprints and node-level descriptors in a purely feedforward manner, embedding structural information using molecular fingerprints and node-level descriptors in place of adjacency-based message passing. Our findings reinforce the usefulness of graph-based architectures in predicting molecular properties of complex contaminants such as PFAS, as compared against traditional machine learning (ML) models. Furthermore, the GE-MLP emerged as a strong GNN-based contender, demonstrating the highest predictive performance for IP, suggesting that integrating structural information via atomic and fingerprint based molecular descriptors into dense neural networks offers a viable alternative to adjacency-based message passing methods. Finally, our GE-MLP provides a computationally efficient alternative to other GNN-based methods due to savings in model training, offering a scalable, message-passing-free approach to molecular property prediction while retaining structural awareness. Future work includes the expansion of the data set to 3.5 million fluorinated compounds to improve generalization, as well as architectural improvements that include transfer learning, topological embeddings, and hybrid models to further advance predictive accuracy and applicability.
期刊介绍:
Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health.
Subject areas include, but are not limited to:
• Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies;
• Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change;
• Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects;
• Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects;
• Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest;
• New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.