[Soil Cadmium Prediction and Health Risk Assessment of an Oasis on the Eastern Edge of the Tarim Basin Based on Feature Optimization and Machine Learning].

Q2 Environmental Science
Jing-Yu Liu, Ruo-Yi Li, Yong-Chun Liang, Lei Liu, Fang Yin, Su Tang, Lin-Sen He, Yi Zhang
{"title":"[Soil Cadmium Prediction and Health Risk Assessment of an Oasis on the Eastern Edge of the Tarim Basin Based on Feature Optimization and Machine Learning].","authors":"Jing-Yu Liu, Ruo-Yi Li, Yong-Chun Liang, Lei Liu, Fang Yin, Su Tang, Lin-Sen He, Yi Zhang","doi":"10.13227/j.hjkx.202308010","DOIUrl":null,"url":null,"abstract":"<p><p>Soil heavy metal pollution poses a serious threat to food security, human health, and soil ecosystems. Based on 644 soil samples collected from a typical oasis located at the eastern margin of the Tarim Basin, a series of models, namely, multiple linear regression (LR), neural network (BP), random forest (RF), support vector machine (SVM), and radial basis function (RBF), were built to predict the soil heavy metal content. The optimal prediction result was obtained and utilized to analyze the spatial distribution features of heavy metal contamination and relevant health risks. The outcomes demonstrated that: ① The average Cd content in the study area was 0.14 mg·kg<sup>-1</sup>, which was 1.17 times the soil background value of Xinjiang, making it the primary factor of soil heavy metal contamination in the area. Additionally, the carcinogenicity risk coefficients of Cd for both adults and children were less than 10<sup>-4</sup>, indicating that there were no significant long-term health risks for humans in the area. ② The estimation accuracies of the five inversion models were compared, and the validation set of the RF model had an <i>R</i><sup>2</sup> value of 0.763 7, which was the highest among the five models. Additionally, the RMSE, MAE, and MBE of the RF model were the smallest among the five models. Therefore, the predicted values of the RF model were most consistent with the measured values of the soil Cd content. The predicted map of soil Cd distribution derived from the RF model coincided best with the interpolation map. ③ The RF model outperformed the other four models in predicting health risks associated with the soil Cd element for both adults and children, resulting in better prediction results. Comparatively, the predicted values of the LR model in the validation set varied greatly, leading to unreliable results. It was demonstrated that the RF was the best model for predicting soil Cd content and evaluating health risks in the study area, considering its superior generalization capability and anti-overfitting ability.</p>","PeriodicalId":35937,"journal":{"name":"Huanjing Kexue/Environmental Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Huanjing Kexue/Environmental Science","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.13227/j.hjkx.202308010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Environmental Science","Score":null,"Total":0}
引用次数: 0

Abstract

Soil heavy metal pollution poses a serious threat to food security, human health, and soil ecosystems. Based on 644 soil samples collected from a typical oasis located at the eastern margin of the Tarim Basin, a series of models, namely, multiple linear regression (LR), neural network (BP), random forest (RF), support vector machine (SVM), and radial basis function (RBF), were built to predict the soil heavy metal content. The optimal prediction result was obtained and utilized to analyze the spatial distribution features of heavy metal contamination and relevant health risks. The outcomes demonstrated that: ① The average Cd content in the study area was 0.14 mg·kg-1, which was 1.17 times the soil background value of Xinjiang, making it the primary factor of soil heavy metal contamination in the area. Additionally, the carcinogenicity risk coefficients of Cd for both adults and children were less than 10-4, indicating that there were no significant long-term health risks for humans in the area. ② The estimation accuracies of the five inversion models were compared, and the validation set of the RF model had an R2 value of 0.763 7, which was the highest among the five models. Additionally, the RMSE, MAE, and MBE of the RF model were the smallest among the five models. Therefore, the predicted values of the RF model were most consistent with the measured values of the soil Cd content. The predicted map of soil Cd distribution derived from the RF model coincided best with the interpolation map. ③ The RF model outperformed the other four models in predicting health risks associated with the soil Cd element for both adults and children, resulting in better prediction results. Comparatively, the predicted values of the LR model in the validation set varied greatly, leading to unreliable results. It was demonstrated that the RF was the best model for predicting soil Cd content and evaluating health risks in the study area, considering its superior generalization capability and anti-overfitting ability.

[基于特征优化和机器学习的塔里木盆地东缘绿洲土壤镉预测与健康风险评估]。
土壤重金属污染对粮食安全、人类健康和土壤生态系统构成了严重威胁。基于从塔里木盆地东缘典型绿洲采集的 644 份土壤样本,建立了一系列模型,即多元线性回归(LR)、神经网络(BP)、随机森林(RFM)、矢量机支持(SVM)和土壤重金属污染分析模型、建立了神经网络(BP)、随机森林(RF)、支持向量机(SVM)和径向基函数(RBF)等一系列模型来预测土壤重金属含量。得到的最优预测结果用于分析重金属污染的空间分布特征及相关健康风险。结果表明:①研究区平均镉含量为 0.14 mg-kg-1,是新疆土壤背景值的 1.17 倍,是该地区土壤重金属污染的首要因素。此外,镉对成人和儿童的致癌风险系数均小于 10-4,表明该地区对人体的长期健康风险不大。比较了五个反演模型的估计精度,RF 模型验证集的 R2 值为 0.763 7,是五个模型中最高的。此外,RF 模型的 RMSE、MAE 和 MBE 也是五个模型中最小的。因此,射频模型的预测值与土壤镉含量的实测值最为一致。射频模型得出的土壤镉分布预测图与插值图的吻合度最高。在预测与土壤中镉元素有关的成人和儿童健康风险方面,射频模型优于其他四种模型,从而获得了更好的预测结果。相比之下,LR 模型在验证集中的预测值差异很大,导致结果不可靠。结果表明,考虑到 RF 模型优越的泛化能力和抗过拟合能力,它是预测研究区域土壤镉含量和评估健康风险的最佳模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Huanjing Kexue/Environmental Science
Huanjing Kexue/Environmental Science Environmental Science-Environmental Science (all)
CiteScore
4.40
自引率
0.00%
发文量
15329
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信