Hybrid intelligence for environmental pollution: biodegradability assessment of organic compounds through multimodal integration of graph attention networks and QSAR models.
{"title":"Hybrid intelligence for environmental pollution: biodegradability assessment of organic compounds through multimodal integration of graph attention networks and QSAR models.","authors":"Abbas Salimi, Jin Yong Lee","doi":"10.1039/d4em00594e","DOIUrl":null,"url":null,"abstract":"<p><p>Computational methods are crucial for assessing chemical biodegradability, given their significant impact on both environmental and human health. Organic compounds that are not biodegradable can persist in the environment, contributing to pollution. Our novel approach leverages graph attention networks (GATs) and incorporates node and edge attributes for biodegradability prediction. Quantitative Structure-Activity Relationship (QSAR) models using two-dimensional descriptors alongside weighted average and stacking approaches were employed to generate ensemble models. The GAT models demonstrated a stable function and generally higher specificity on the validation set compared to a graph convolutional network, although definitive superiority is challenging to establish owing to overlapping standard deviations. However, the sensitivities tended to decrease with potential performance overlap owing to the interval intersection. Ensemble learning enhanced several performance metrics compared with individual models and base models, with the combination of extreme Gradient Boosting and GAT achieving the highest precision and specificity. Combining GAT with random forest and Gradient Boosting may be preferable for accurately predicting biodegradable molecules, whereas the stacking approach may be suitable for prioritizing the correct classification of nonbiodegradable substances. Important descriptors, such as SpMax1_Bh(m) and SAscore, were identified in at least two QSAR models. Despite inherent complexities, the ease of implementation depends on factors such as data availability, and domain knowledge. Assessing the biodegradability of organic compounds is essential for reducing their environmental impact, assessing risks, ensuring regulatory compliance, promoting sustainable development, and supporting effective pollution remediation. It assists in making informed decisions about chemical use, waste management, and environmental protection.</p>","PeriodicalId":74,"journal":{"name":"Environmental Science: Processes & Impacts","volume":" ","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science: Processes & Impacts","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1039/d4em00594e","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Computational methods are crucial for assessing chemical biodegradability, given their significant impact on both environmental and human health. Organic compounds that are not biodegradable can persist in the environment, contributing to pollution. Our novel approach leverages graph attention networks (GATs) and incorporates node and edge attributes for biodegradability prediction. Quantitative Structure-Activity Relationship (QSAR) models using two-dimensional descriptors alongside weighted average and stacking approaches were employed to generate ensemble models. The GAT models demonstrated a stable function and generally higher specificity on the validation set compared to a graph convolutional network, although definitive superiority is challenging to establish owing to overlapping standard deviations. However, the sensitivities tended to decrease with potential performance overlap owing to the interval intersection. Ensemble learning enhanced several performance metrics compared with individual models and base models, with the combination of extreme Gradient Boosting and GAT achieving the highest precision and specificity. Combining GAT with random forest and Gradient Boosting may be preferable for accurately predicting biodegradable molecules, whereas the stacking approach may be suitable for prioritizing the correct classification of nonbiodegradable substances. Important descriptors, such as SpMax1_Bh(m) and SAscore, were identified in at least two QSAR models. Despite inherent complexities, the ease of implementation depends on factors such as data availability, and domain knowledge. Assessing the biodegradability of organic compounds is essential for reducing their environmental impact, assessing risks, ensuring regulatory compliance, promoting sustainable development, and supporting effective pollution remediation. It assists in making informed decisions about chemical use, waste management, and environmental protection.
期刊介绍:
Environmental Science: Processes & Impacts publishes high quality papers in all areas of the environmental chemical sciences, including chemistry of the air, water, soil and sediment. We welcome studies on the environmental fate and effects of anthropogenic and naturally occurring contaminants, both chemical and microbiological, as well as related natural element cycling processes.