Bin Yang , Anqi He , Zhong Ren , Kai Yu , Gang Zhao , Yanchun Fan , Qi Wang , Shenglian Luo
{"title":"数据稀缺和空间异质性下土壤重金属污染预测的迁移学习增强深度学习框架","authors":"Bin Yang , Anqi He , Zhong Ren , Kai Yu , Gang Zhao , Yanchun Fan , Qi Wang , Shenglian Luo","doi":"10.1016/j.jhazmat.2025.138926","DOIUrl":null,"url":null,"abstract":"<div><div>Large-scale soil heavy metal pollution risk estimation remains challenging due to data scarcity and spatial heterogeneity. Although traditional machine learning (ML) methods offer notable predictive capabilities, they often struggle with high-dimensional, heterogeneous data, limited labeled samples, and insufficient interpretability. In this study, we propose a transfer learning (TL)-based deep learning (DL) framework that integrates convolutional neural networks (CNN), termed TL-CNN, with remote sensing-based (RSs), web-based (WBs), and field-sampled datasets (including spatial regionalization features, SRs) to efficiently predict soil heavy metal pollution. By coupling hierarchical feature extraction with a GradSHAP interpretability module, the approach provides both predictive accuracy and explanatory insights. Results from Shaoguan City (2018–2022) demonstrate that the TL-CNN model substantially outperforms conventional ML methods, with overall accuracy exceeding 84 %, particularly under multi-metal pollution scenarios. Leveraging TL, the model adaptively addresses data scarcity, reducing the need for costly field sampling and mitigating interpolation errors. The incorporation of RSs- and WBs-derived features captures critical environmental variability and anthropogenic emissions, while SRs refine local pollution patterns. GradSHAP analyses highlight the pivotal role of RSs features and spatial metrics in large-scale predictions. Overall, the proposed TL-CNN model underscores the potential of multi-source heterogeneous datasets and TL-based DL strategies to promote sustainable soil management.</div></div>","PeriodicalId":361,"journal":{"name":"Journal of Hazardous Materials","volume":"495 ","pages":"Article 138926"},"PeriodicalIF":11.3000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A transfer learning–enhanced deep learning framework for efficient and interpretable soil heavy metal pollution prediction under data scarcity and spatial heterogeneity\",\"authors\":\"Bin Yang , Anqi He , Zhong Ren , Kai Yu , Gang Zhao , Yanchun Fan , Qi Wang , Shenglian Luo\",\"doi\":\"10.1016/j.jhazmat.2025.138926\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large-scale soil heavy metal pollution risk estimation remains challenging due to data scarcity and spatial heterogeneity. Although traditional machine learning (ML) methods offer notable predictive capabilities, they often struggle with high-dimensional, heterogeneous data, limited labeled samples, and insufficient interpretability. In this study, we propose a transfer learning (TL)-based deep learning (DL) framework that integrates convolutional neural networks (CNN), termed TL-CNN, with remote sensing-based (RSs), web-based (WBs), and field-sampled datasets (including spatial regionalization features, SRs) to efficiently predict soil heavy metal pollution. By coupling hierarchical feature extraction with a GradSHAP interpretability module, the approach provides both predictive accuracy and explanatory insights. Results from Shaoguan City (2018–2022) demonstrate that the TL-CNN model substantially outperforms conventional ML methods, with overall accuracy exceeding 84 %, particularly under multi-metal pollution scenarios. Leveraging TL, the model adaptively addresses data scarcity, reducing the need for costly field sampling and mitigating interpolation errors. The incorporation of RSs- and WBs-derived features captures critical environmental variability and anthropogenic emissions, while SRs refine local pollution patterns. GradSHAP analyses highlight the pivotal role of RSs features and spatial metrics in large-scale predictions. Overall, the proposed TL-CNN model underscores the potential of multi-source heterogeneous datasets and TL-based DL strategies to promote sustainable soil management.</div></div>\",\"PeriodicalId\":361,\"journal\":{\"name\":\"Journal of Hazardous Materials\",\"volume\":\"495 \",\"pages\":\"Article 138926\"},\"PeriodicalIF\":11.3000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hazardous Materials\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0304389425018424\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hazardous Materials","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304389425018424","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
A transfer learning–enhanced deep learning framework for efficient and interpretable soil heavy metal pollution prediction under data scarcity and spatial heterogeneity
Large-scale soil heavy metal pollution risk estimation remains challenging due to data scarcity and spatial heterogeneity. Although traditional machine learning (ML) methods offer notable predictive capabilities, they often struggle with high-dimensional, heterogeneous data, limited labeled samples, and insufficient interpretability. In this study, we propose a transfer learning (TL)-based deep learning (DL) framework that integrates convolutional neural networks (CNN), termed TL-CNN, with remote sensing-based (RSs), web-based (WBs), and field-sampled datasets (including spatial regionalization features, SRs) to efficiently predict soil heavy metal pollution. By coupling hierarchical feature extraction with a GradSHAP interpretability module, the approach provides both predictive accuracy and explanatory insights. Results from Shaoguan City (2018–2022) demonstrate that the TL-CNN model substantially outperforms conventional ML methods, with overall accuracy exceeding 84 %, particularly under multi-metal pollution scenarios. Leveraging TL, the model adaptively addresses data scarcity, reducing the need for costly field sampling and mitigating interpolation errors. The incorporation of RSs- and WBs-derived features captures critical environmental variability and anthropogenic emissions, while SRs refine local pollution patterns. GradSHAP analyses highlight the pivotal role of RSs features and spatial metrics in large-scale predictions. Overall, the proposed TL-CNN model underscores the potential of multi-source heterogeneous datasets and TL-based DL strategies to promote sustainable soil management.
期刊介绍:
The Journal of Hazardous Materials serves as a global platform for promoting cutting-edge research in the field of Environmental Science and Engineering. Our publication features a wide range of articles, including full-length research papers, review articles, and perspectives, with the aim of enhancing our understanding of the dangers and risks associated with various materials concerning public health and the environment. It is important to note that the term "environmental contaminants" refers specifically to substances that pose hazardous effects through contamination, while excluding those that do not have such impacts on the environment or human health. Moreover, we emphasize the distinction between wastes and hazardous materials in order to provide further clarity on the scope of the journal. We have a keen interest in exploring specific compounds and microbial agents that have adverse effects on the environment.