数据稀缺和空间异质性下土壤重金属污染预测的迁移学习增强深度学习框架

IF 11.3 1区环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL

Journal of Hazardous Materials Pub Date : 2025-06-13 DOI:10.1016/j.jhazmat.2025.138926

Bin Yang , Anqi He , Zhong Ren , Kai Yu , Gang Zhao , Yanchun Fan , Qi Wang , Shenglian Luo

{"title":"数据稀缺和空间异质性下土壤重金属污染预测的迁移学习增强深度学习框架","authors":"Bin Yang , Anqi He , Zhong Ren , Kai Yu , Gang Zhao , Yanchun Fan , Qi Wang , Shenglian Luo","doi":"10.1016/j.jhazmat.2025.138926","DOIUrl":null,"url":null,"abstract":"<div><div>Large-scale soil heavy metal pollution risk estimation remains challenging due to data scarcity and spatial heterogeneity. Although traditional machine learning (ML) methods offer notable predictive capabilities, they often struggle with high-dimensional, heterogeneous data, limited labeled samples, and insufficient interpretability. In this study, we propose a transfer learning (TL)-based deep learning (DL) framework that integrates convolutional neural networks (CNN), termed TL-CNN, with remote sensing-based (RSs), web-based (WBs), and field-sampled datasets (including spatial regionalization features, SRs) to efficiently predict soil heavy metal pollution. By coupling hierarchical feature extraction with a GradSHAP interpretability module, the approach provides both predictive accuracy and explanatory insights. Results from Shaoguan City (2018–2022) demonstrate that the TL-CNN model substantially outperforms conventional ML methods, with overall accuracy exceeding 84 %, particularly under multi-metal pollution scenarios. Leveraging TL, the model adaptively addresses data scarcity, reducing the need for costly field sampling and mitigating interpolation errors. The incorporation of RSs- and WBs-derived features captures critical environmental variability and anthropogenic emissions, while SRs refine local pollution patterns. GradSHAP analyses highlight the pivotal role of RSs features and spatial metrics in large-scale predictions. Overall, the proposed TL-CNN model underscores the potential of multi-source heterogeneous datasets and TL-based DL strategies to promote sustainable soil management.</div></div>","PeriodicalId":361,"journal":{"name":"Journal of Hazardous Materials","volume":"495 ","pages":"Article 138926"},"PeriodicalIF":11.3000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A transfer learning–enhanced deep learning framework for efficient and interpretable soil heavy metal pollution prediction under data scarcity and spatial heterogeneity\",\"authors\":\"Bin Yang , Anqi He , Zhong Ren , Kai Yu , Gang Zhao , Yanchun Fan , Qi Wang , Shenglian Luo\",\"doi\":\"10.1016/j.jhazmat.2025.138926\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large-scale soil heavy metal pollution risk estimation remains challenging due to data scarcity and spatial heterogeneity. Although traditional machine learning (ML) methods offer notable predictive capabilities, they often struggle with high-dimensional, heterogeneous data, limited labeled samples, and insufficient interpretability. In this study, we propose a transfer learning (TL)-based deep learning (DL) framework that integrates convolutional neural networks (CNN), termed TL-CNN, with remote sensing-based (RSs), web-based (WBs), and field-sampled datasets (including spatial regionalization features, SRs) to efficiently predict soil heavy metal pollution. By coupling hierarchical feature extraction with a GradSHAP interpretability module, the approach provides both predictive accuracy and explanatory insights. Results from Shaoguan City (2018–2022) demonstrate that the TL-CNN model substantially outperforms conventional ML methods, with overall accuracy exceeding 84 %, particularly under multi-metal pollution scenarios. Leveraging TL, the model adaptively addresses data scarcity, reducing the need for costly field sampling and mitigating interpolation errors. The incorporation of RSs- and WBs-derived features captures critical environmental variability and anthropogenic emissions, while SRs refine local pollution patterns. GradSHAP analyses highlight the pivotal role of RSs features and spatial metrics in large-scale predictions. Overall, the proposed TL-CNN model underscores the potential of multi-source heterogeneous datasets and TL-based DL strategies to promote sustainable soil management.</div></div>\",\"PeriodicalId\":361,\"journal\":{\"name\":\"Journal of Hazardous Materials\",\"volume\":\"495 \",\"pages\":\"Article 138926\"},\"PeriodicalIF\":11.3000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hazardous Materials\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0304389425018424\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hazardous Materials","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304389425018424","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

由于数据的稀缺性和空间异质性，大规模土壤重金属污染风险评估仍然具有挑战性。尽管传统的机器学习（ML）方法提供了显著的预测能力，但它们往往难以处理高维、异构数据、有限的标记样本和不足的可解释性。在这项研究中，我们提出了一个基于迁移学习（TL）的深度学习（DL）框架，该框架将卷积神经网络（CNN）（TL -CNN）与基于遥感（RSs）、基于网络（WBs）和现场采样数据集（包括空间区划特征，SRs）集成在一起，以有效预测土壤重金属污染。通过将分层特征提取与GradSHAP可解释性模块相结合，该方法提供了预测准确性和解释性见解。来自韶关市（2018-2022）的结果表明，TL-CNN模型大大优于传统的ML方法，总体准确率超过84% %，特别是在多金属污染场景下。利用TL，该模型自适应地解决了数据稀缺问题，减少了昂贵的现场采样需求，并减轻了插值误差。结合RSs和wbs衍生的特征捕获了关键的环境变异性和人为排放，而sr则细化了当地的污染模式。GradSHAP分析强调了RSs特征和空间度量在大规模预测中的关键作用。总的来说，提出的TL-CNN模型强调了多源异构数据集和基于tl的深度学习策略在促进可持续土壤管理方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A transfer learning–enhanced deep learning framework for efficient and interpretable soil heavy metal pollution prediction under data scarcity and spatial heterogeneity

查看原文本刊更多论文

A transfer learning–enhanced deep learning framework for efficient and interpretable soil heavy metal pollution prediction under data scarcity and spatial heterogeneity

Large-scale soil heavy metal pollution risk estimation remains challenging due to data scarcity and spatial heterogeneity. Although traditional machine learning (ML) methods offer notable predictive capabilities, they often struggle with high-dimensional, heterogeneous data, limited labeled samples, and insufficient interpretability. In this study, we propose a transfer learning (TL)-based deep learning (DL) framework that integrates convolutional neural networks (CNN), termed TL-CNN, with remote sensing-based (RSs), web-based (WBs), and field-sampled datasets (including spatial regionalization features, SRs) to efficiently predict soil heavy metal pollution. By coupling hierarchical feature extraction with a GradSHAP interpretability module, the approach provides both predictive accuracy and explanatory insights. Results from Shaoguan City (2018–2022) demonstrate that the TL-CNN model substantially outperforms conventional ML methods, with overall accuracy exceeding 84 %, particularly under multi-metal pollution scenarios. Leveraging TL, the model adaptively addresses data scarcity, reducing the need for costly field sampling and mitigating interpolation errors. The incorporation of RSs- and WBs-derived features captures critical environmental variability and anthropogenic emissions, while SRs refine local pollution patterns. GradSHAP analyses highlight the pivotal role of RSs features and spatial metrics in large-scale predictions. Overall, the proposed TL-CNN model underscores the potential of multi-source heterogeneous datasets and TL-based DL strategies to promote sustainable soil management.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Hazardous Materials 工程技术-工程：环境

CiteScore

25.40

自引率

5.90%

发文量

3059

审稿时长

58 days

期刊介绍： The Journal of Hazardous Materials serves as a global platform for promoting cutting-edge research in the field of Environmental Science and Engineering. Our publication features a wide range of articles, including full-length research papers, review articles, and perspectives, with the aim of enhancing our understanding of the dangers and risks associated with various materials concerning public health and the environment. It is important to note that the term "environmental contaminants" refers specifically to substances that pose hazardous effects through contamination, while excluding those that do not have such impacts on the environment or human health. Moreover, we emphasize the distinction between wastes and hazardous materials in order to provide further clarity on the scope of the journal. We have a keen interest in exploring specific compounds and microbial agents that have adverse effects on the environment.