{"title":"机器学习支持的土壤重金属特定地点自然背景值测定","authors":"Jian Wu, Chengmin Huang","doi":"10.1016/j.jhazmat.2025.137276","DOIUrl":null,"url":null,"abstract":"Heavy metal natural background values play a crucial role in distinguishing anthropogenic sources from natural sources to assess human impacts in polluted areas, thereby accurately formulating environmental policies. However, due to limitations imposed by human activities, research methods, and regional constraints, the determination of heavy metal background values, particularly on site or profile scale, is often challenging, highlighting the urgent need for new methodologies. To establish a comprehensive dataset containing heavy metal concentrations and soil properties, the study systematically collected and screened 82 soil profiles from areas minimally affected by human activities, resulting in a total of 2,185 data sets. Using soil depth, pH, organic matter, weathering indices (SAF, BA), Fe<sub>2</sub>O<sub>3</sub>, MgO, Na<sub>2</sub>O, CaO, and K<sub>2</sub>O as model input variables, the predictive performance for site-specific background levels of Cd, Cr, Cu, Ni, Pb, and Zn was compared across four advanced machine learning models (RF (random forest), XGBoost (extreme gradient boosting), ANN (artificial neural network), SVR (support vector regression)). The results indicated that the optimal model for predicting background values of Cd, Cr, and Ni was XGBoost (MAE = 0.14 – 0.17; MSE = 0.04 – 0.06; R² = 0.82 – 0.87), while RF was used for Cu, Pb, and Zn (MAE = 0.01 – 0.18; MSE = 0.02 – 0.06; R² = 0.89 – 0.95). Importance assessments using RF and SHAP revealed that pH is a key controlling factor for Cd and Ni, Fe<sub>2</sub>O<sub>3</sub> significantly impacts Cr, Cu, and Zn background levels, and K<sub>2</sub>O is the main controlling factor for Pb. The machine learning models developed can effectively predict the background levels of these six heavy metals based on major elemental and soil physicochemical properties, particularly achieving accurate predictions for Cu and Zn using just two input variables. This machine learning prediction framework is based on major elemental compositions and the physical/chemical properties of soil, enables precise and cost-effective point-to-point environmental assessments, thereby offering significant potential for practical applications.","PeriodicalId":361,"journal":{"name":"Journal of Hazardous Materials","volume":"51 1","pages":""},"PeriodicalIF":11.3000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning-supported determination for site-specific natural background values of soil heavy metals\",\"authors\":\"Jian Wu, Chengmin Huang\",\"doi\":\"10.1016/j.jhazmat.2025.137276\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heavy metal natural background values play a crucial role in distinguishing anthropogenic sources from natural sources to assess human impacts in polluted areas, thereby accurately formulating environmental policies. However, due to limitations imposed by human activities, research methods, and regional constraints, the determination of heavy metal background values, particularly on site or profile scale, is often challenging, highlighting the urgent need for new methodologies. To establish a comprehensive dataset containing heavy metal concentrations and soil properties, the study systematically collected and screened 82 soil profiles from areas minimally affected by human activities, resulting in a total of 2,185 data sets. Using soil depth, pH, organic matter, weathering indices (SAF, BA), Fe<sub>2</sub>O<sub>3</sub>, MgO, Na<sub>2</sub>O, CaO, and K<sub>2</sub>O as model input variables, the predictive performance for site-specific background levels of Cd, Cr, Cu, Ni, Pb, and Zn was compared across four advanced machine learning models (RF (random forest), XGBoost (extreme gradient boosting), ANN (artificial neural network), SVR (support vector regression)). The results indicated that the optimal model for predicting background values of Cd, Cr, and Ni was XGBoost (MAE = 0.14 – 0.17; MSE = 0.04 – 0.06; R² = 0.82 – 0.87), while RF was used for Cu, Pb, and Zn (MAE = 0.01 – 0.18; MSE = 0.02 – 0.06; R² = 0.89 – 0.95). Importance assessments using RF and SHAP revealed that pH is a key controlling factor for Cd and Ni, Fe<sub>2</sub>O<sub>3</sub> significantly impacts Cr, Cu, and Zn background levels, and K<sub>2</sub>O is the main controlling factor for Pb. The machine learning models developed can effectively predict the background levels of these six heavy metals based on major elemental and soil physicochemical properties, particularly achieving accurate predictions for Cu and Zn using just two input variables. This machine learning prediction framework is based on major elemental compositions and the physical/chemical properties of soil, enables precise and cost-effective point-to-point environmental assessments, thereby offering significant potential for practical applications.\",\"PeriodicalId\":361,\"journal\":{\"name\":\"Journal of Hazardous Materials\",\"volume\":\"51 1\",\"pages\":\"\"},\"PeriodicalIF\":11.3000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hazardous Materials\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jhazmat.2025.137276\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hazardous Materials","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.jhazmat.2025.137276","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
Machine learning-supported determination for site-specific natural background values of soil heavy metals
Heavy metal natural background values play a crucial role in distinguishing anthropogenic sources from natural sources to assess human impacts in polluted areas, thereby accurately formulating environmental policies. However, due to limitations imposed by human activities, research methods, and regional constraints, the determination of heavy metal background values, particularly on site or profile scale, is often challenging, highlighting the urgent need for new methodologies. To establish a comprehensive dataset containing heavy metal concentrations and soil properties, the study systematically collected and screened 82 soil profiles from areas minimally affected by human activities, resulting in a total of 2,185 data sets. Using soil depth, pH, organic matter, weathering indices (SAF, BA), Fe2O3, MgO, Na2O, CaO, and K2O as model input variables, the predictive performance for site-specific background levels of Cd, Cr, Cu, Ni, Pb, and Zn was compared across four advanced machine learning models (RF (random forest), XGBoost (extreme gradient boosting), ANN (artificial neural network), SVR (support vector regression)). The results indicated that the optimal model for predicting background values of Cd, Cr, and Ni was XGBoost (MAE = 0.14 – 0.17; MSE = 0.04 – 0.06; R² = 0.82 – 0.87), while RF was used for Cu, Pb, and Zn (MAE = 0.01 – 0.18; MSE = 0.02 – 0.06; R² = 0.89 – 0.95). Importance assessments using RF and SHAP revealed that pH is a key controlling factor for Cd and Ni, Fe2O3 significantly impacts Cr, Cu, and Zn background levels, and K2O is the main controlling factor for Pb. The machine learning models developed can effectively predict the background levels of these six heavy metals based on major elemental and soil physicochemical properties, particularly achieving accurate predictions for Cu and Zn using just two input variables. This machine learning prediction framework is based on major elemental compositions and the physical/chemical properties of soil, enables precise and cost-effective point-to-point environmental assessments, thereby offering significant potential for practical applications.
期刊介绍:
The Journal of Hazardous Materials serves as a global platform for promoting cutting-edge research in the field of Environmental Science and Engineering. Our publication features a wide range of articles, including full-length research papers, review articles, and perspectives, with the aim of enhancing our understanding of the dangers and risks associated with various materials concerning public health and the environment. It is important to note that the term "environmental contaminants" refers specifically to substances that pose hazardous effects through contamination, while excluding those that do not have such impacts on the environment or human health. Moreover, we emphasize the distinction between wastes and hazardous materials in order to provide further clarity on the scope of the journal. We have a keen interest in exploring specific compounds and microbial agents that have adverse effects on the environment.