{"title":"Applications of geographically weighted machine learning models for predicting soil heavy metal concentrations across mining sites.","authors":"Hyemin Jeong, Younghun Lee, Byeongwon Lee, Euisoo Jung, Jai-Young Lee, Sangchul Lee","doi":"10.1016/j.scitotenv.2024.177667","DOIUrl":null,"url":null,"abstract":"<p><p>The accurate prediction of soil heavy metal contamination is crucial for the effective environmental management of abandoned mining areas. However, conventional machine learning models (CMLMs) often fail to account for the spatial heterogeneity of soil contamination, which limits their predictive accuracy. This study evaluated the performance of geographically weighted machine learning models (GWMLMs) in predicting soil Cd and Pb concentrations in abandoned mines in the Republic of Korea. We compared two GWMLMs (Geographically Weighted Random Forest and Geographically Weighted Extreme Gradient Boosting) with four CMLMs (Random Forest, Gradient Boosting, Light Gradient Boosting, and extreme Gradient Boosting). The data used in this study included soil samples from six abandoned mining sites with various geographical and soil input variables. The results showed that the GWMLMs consistently outperformed the CMLMs in predicting heavy metal contamination. For Cd predictions, GWMLMs exhibited on average 0.02 lower root mean square error and mean absolute error values, with a 0.26 increase in R<sup>2</sup> values compared to CMLMs. Similarly, for Pb predictions, the GWMLMs showed 0.18 and 0.13 lower root mean square error and mean absolute error values, respectively, and a 0.17 increase in R<sup>2</sup> relative to the CMLMs. The findings demonstrate the usefulness of GWMLMs for predicting the spatial distribution of soil heavy metals. SHapley Additive exPlanations analysis exhibited elevation and distance from abandoned mining sites as the most influential factors in predicting both Cd and Pb concentrations. This study highlights the value of GWMLMs that incorporate spatial heterogeneity into CMLMs for enhancing prediction accuracy and providing crucial insights for environmental management in mining-impacted regions.</p>","PeriodicalId":422,"journal":{"name":"Science of the Total Environment","volume":" ","pages":"177667"},"PeriodicalIF":8.2000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of the Total Environment","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.scitotenv.2024.177667","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The accurate prediction of soil heavy metal contamination is crucial for the effective environmental management of abandoned mining areas. However, conventional machine learning models (CMLMs) often fail to account for the spatial heterogeneity of soil contamination, which limits their predictive accuracy. This study evaluated the performance of geographically weighted machine learning models (GWMLMs) in predicting soil Cd and Pb concentrations in abandoned mines in the Republic of Korea. We compared two GWMLMs (Geographically Weighted Random Forest and Geographically Weighted Extreme Gradient Boosting) with four CMLMs (Random Forest, Gradient Boosting, Light Gradient Boosting, and extreme Gradient Boosting). The data used in this study included soil samples from six abandoned mining sites with various geographical and soil input variables. The results showed that the GWMLMs consistently outperformed the CMLMs in predicting heavy metal contamination. For Cd predictions, GWMLMs exhibited on average 0.02 lower root mean square error and mean absolute error values, with a 0.26 increase in R2 values compared to CMLMs. Similarly, for Pb predictions, the GWMLMs showed 0.18 and 0.13 lower root mean square error and mean absolute error values, respectively, and a 0.17 increase in R2 relative to the CMLMs. The findings demonstrate the usefulness of GWMLMs for predicting the spatial distribution of soil heavy metals. SHapley Additive exPlanations analysis exhibited elevation and distance from abandoned mining sites as the most influential factors in predicting both Cd and Pb concentrations. This study highlights the value of GWMLMs that incorporate spatial heterogeneity into CMLMs for enhancing prediction accuracy and providing crucial insights for environmental management in mining-impacted regions.
期刊介绍:
The Science of the Total Environment is an international journal dedicated to scientific research on the environment and its interaction with humanity. It covers a wide range of disciplines and seeks to publish innovative, hypothesis-driven, and impactful research that explores the entire environment, including the atmosphere, lithosphere, hydrosphere, biosphere, and anthroposphere.
The journal's updated Aims & Scope emphasizes the importance of interdisciplinary environmental research with broad impact. Priority is given to studies that advance fundamental understanding and explore the interconnectedness of multiple environmental spheres. Field studies are preferred, while laboratory experiments must demonstrate significant methodological advancements or mechanistic insights with direct relevance to the environment.