Leveraging visible-near-infrared spectroscopy and machine learning to detect nickel contamination in soil: Addressing class imbalances for environmental management

IF 5.4 Q2 ENGINEERING, ENVIRONMENTAL
Chongchong Qi , Kechao Li , Min Zhou , Chunhui Zhang , Xiaoming Zheng , Qiusong Chen , Tao Hu
{"title":"Leveraging visible-near-infrared spectroscopy and machine learning to detect nickel contamination in soil: Addressing class imbalances for environmental management","authors":"Chongchong Qi ,&nbsp;Kechao Li ,&nbsp;Min Zhou ,&nbsp;Chunhui Zhang ,&nbsp;Xiaoming Zheng ,&nbsp;Qiusong Chen ,&nbsp;Tao Hu","doi":"10.1016/j.hazadv.2024.100489","DOIUrl":null,"url":null,"abstract":"<div><div>Excessive concentrations of Ni in soil have many severe effects, negatively affecting human health and leading to disease, while also posing a threat to animals and plants. Although the dangers of high Ni concentrations have been widely recognized, rapid and large-scale tools for the identification of Ni contamination are still lacking. Visible-near-infrared (Vis-NIR) spectroscopy has been employed to rapidly identify Ni contamination; however, previous studies suffer from issues inherent to small datasets and the tendency to negate data imbalances. To address these issues, a large dataset comprising 18,675 soil samples was used to predict soil Ni contamination by combining Vis-NIR data with machine learning (ML). The data imbalance inherent to previous studies was addressed using two data sampling methods. To build a robust classification model for Ni contamination, four spectral preprocessing methods and four ML algorithms were compared. The optimal extreme gradient boosting model achieved recall, accuracy, area under the curve, and geometric mean scores of 0.8203, 0.8806, 0.9268, and 0.8508, respectively. Model predictions across the United States identified specific regions with high possibility of Ni contamination. Overall, the model developed in this study offers an improved accuracy in predicting soil Ni contamination at the continental scale, and can be used to prioritize further testing and guide policymaking.</div></div>","PeriodicalId":73763,"journal":{"name":"Journal of hazardous materials advances","volume":"16 ","pages":"Article 100489"},"PeriodicalIF":5.4000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of hazardous materials advances","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772416624000901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Excessive concentrations of Ni in soil have many severe effects, negatively affecting human health and leading to disease, while also posing a threat to animals and plants. Although the dangers of high Ni concentrations have been widely recognized, rapid and large-scale tools for the identification of Ni contamination are still lacking. Visible-near-infrared (Vis-NIR) spectroscopy has been employed to rapidly identify Ni contamination; however, previous studies suffer from issues inherent to small datasets and the tendency to negate data imbalances. To address these issues, a large dataset comprising 18,675 soil samples was used to predict soil Ni contamination by combining Vis-NIR data with machine learning (ML). The data imbalance inherent to previous studies was addressed using two data sampling methods. To build a robust classification model for Ni contamination, four spectral preprocessing methods and four ML algorithms were compared. The optimal extreme gradient boosting model achieved recall, accuracy, area under the curve, and geometric mean scores of 0.8203, 0.8806, 0.9268, and 0.8508, respectively. Model predictions across the United States identified specific regions with high possibility of Ni contamination. Overall, the model developed in this study offers an improved accuracy in predicting soil Ni contamination at the continental scale, and can be used to prioritize further testing and guide policymaking.

Abstract Image

利用可见光-近红外光谱和机器学习检测土壤中的镍污染:解决类别不平衡问题,促进环境管理
土壤中过高浓度的镍会产生许多严重影响,对人类健康造成负面影响并导致疾病,同时还会对动物和植物构成威胁。尽管人们已普遍认识到高浓度镍的危害,但仍缺乏快速、大规模的镍污染识别工具。可见光-近红外(Vis-NIR)光谱法已被用于快速识别镍污染;然而,以往的研究受到小数据集固有问题和数据不平衡倾向的影响。为了解决这些问题,我们使用了一个由 18,675 个土壤样本组成的大型数据集,通过将可见光-近红外数据与机器学习(ML)相结合来预测土壤镍污染。利用两种数据采样方法解决了以往研究中固有的数据不平衡问题。为了建立一个可靠的镍污染分类模型,比较了四种光谱预处理方法和四种 ML 算法。最佳极端梯度提升模型的召回率、准确率、曲线下面积和几何平均得分分别为 0.8203、0.8806、0.9268 和 0.8508。模型对美国各地的预测确定了镍污染可能性较高的特定区域。总体而言,本研究开发的模型提高了预测大陆范围土壤镍污染的准确性,可用于确定进一步测试的优先次序并指导政策制定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of hazardous materials advances
Journal of hazardous materials advances Environmental Engineering
CiteScore
4.80
自引率
0.00%
发文量
0
审稿时长
50 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信