Qian Wang , Jianmin Bian , Enze Ma , Jiangwei Zhang
{"title":"Predicting sorption of organic pollutants on soils with interpretable machine learning","authors":"Qian Wang , Jianmin Bian , Enze Ma , Jiangwei Zhang","doi":"10.1016/j.envpol.2025.126665","DOIUrl":null,"url":null,"abstract":"<div><div>The sorption of organic pollutants (OPs) on soils plays a critical role in determining the environmental fate and transport of these compounds, which has been extensively studied. However, the complex nonlinear relationships between adsorption capacity and multiple influencing factors, as well as the relative contributions of these factors to adsorption behavior, remain inadequately understood. This study develops five machine learning (ML) models—support vector machine (SVM), deep neural networks (DNN), extreme gradient boosting (XGBT), random forest (RF), and gradient boosting decision tree (GBDT)—using a dataset of 352 data points from previous studies to predict OPs sorption on soils based on multiple factors. Shapley additive interpretation (SHAP) is applied to perform interpretability analysis based on the model exhibiting superior performance. Additionally, the distribution map of the sorption capacities of 12 OPs across mainland China is generated using the interpretable ML model. The results indicate that the XGBT model demonstrates superior performance, achieving a coefficient of determination of 0.952 and a root mean square error of 0.103 for the testing dataset. Interpretability analysis reveals that the electronic effects (<em>E</em>) of OPs and soil organic matter (SOM) content are the most influential factors. This finding underscores the dominant roles of π-π interactions and hydrophobic partitioning in the sorption mechanisms. The distribution map indicates that high sorption capacities are predominantly located in southern and southwestern regions, correlating with reduced environmental risks. This study presents a novel interpretable ML framework for predicting OPs adsorption potential and offers valuable insights into the mechanisms governing OPs sorption on soils. Furthermore, this framework supports the environmental management applications in risk assessment, land remediation strategies planning, and soil protection policy.</div></div>","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"382 ","pages":"Article 126665"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0269749125010383","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The sorption of organic pollutants (OPs) on soils plays a critical role in determining the environmental fate and transport of these compounds, which has been extensively studied. However, the complex nonlinear relationships between adsorption capacity and multiple influencing factors, as well as the relative contributions of these factors to adsorption behavior, remain inadequately understood. This study develops five machine learning (ML) models—support vector machine (SVM), deep neural networks (DNN), extreme gradient boosting (XGBT), random forest (RF), and gradient boosting decision tree (GBDT)—using a dataset of 352 data points from previous studies to predict OPs sorption on soils based on multiple factors. Shapley additive interpretation (SHAP) is applied to perform interpretability analysis based on the model exhibiting superior performance. Additionally, the distribution map of the sorption capacities of 12 OPs across mainland China is generated using the interpretable ML model. The results indicate that the XGBT model demonstrates superior performance, achieving a coefficient of determination of 0.952 and a root mean square error of 0.103 for the testing dataset. Interpretability analysis reveals that the electronic effects (E) of OPs and soil organic matter (SOM) content are the most influential factors. This finding underscores the dominant roles of π-π interactions and hydrophobic partitioning in the sorption mechanisms. The distribution map indicates that high sorption capacities are predominantly located in southern and southwestern regions, correlating with reduced environmental risks. This study presents a novel interpretable ML framework for predicting OPs adsorption potential and offers valuable insights into the mechanisms governing OPs sorption on soils. Furthermore, this framework supports the environmental management applications in risk assessment, land remediation strategies planning, and soil protection policy.
期刊介绍:
Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health.
Subject areas include, but are not limited to:
• Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies;
• Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change;
• Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects;
• Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects;
• Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest;
• New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.