Chongchong Qi , Kechao Li , Min Zhou , Chunhui Zhang , Xiaoming Zheng , Qiusong Chen , Tao Hu
{"title":"利用可见光-近红外光谱和机器学习检测土壤中的镍污染:解决类别不平衡问题,促进环境管理","authors":"Chongchong Qi , Kechao Li , Min Zhou , Chunhui Zhang , Xiaoming Zheng , Qiusong Chen , Tao Hu","doi":"10.1016/j.hazadv.2024.100489","DOIUrl":null,"url":null,"abstract":"<div><div>Excessive concentrations of Ni in soil have many severe effects, negatively affecting human health and leading to disease, while also posing a threat to animals and plants. Although the dangers of high Ni concentrations have been widely recognized, rapid and large-scale tools for the identification of Ni contamination are still lacking. Visible-near-infrared (Vis-NIR) spectroscopy has been employed to rapidly identify Ni contamination; however, previous studies suffer from issues inherent to small datasets and the tendency to negate data imbalances. To address these issues, a large dataset comprising 18,675 soil samples was used to predict soil Ni contamination by combining Vis-NIR data with machine learning (ML). The data imbalance inherent to previous studies was addressed using two data sampling methods. To build a robust classification model for Ni contamination, four spectral preprocessing methods and four ML algorithms were compared. The optimal extreme gradient boosting model achieved recall, accuracy, area under the curve, and geometric mean scores of 0.8203, 0.8806, 0.9268, and 0.8508, respectively. Model predictions across the United States identified specific regions with high possibility of Ni contamination. Overall, the model developed in this study offers an improved accuracy in predicting soil Ni contamination at the continental scale, and can be used to prioritize further testing and guide policymaking.</div></div>","PeriodicalId":73763,"journal":{"name":"Journal of hazardous materials advances","volume":"16 ","pages":"Article 100489"},"PeriodicalIF":5.4000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging visible-near-infrared spectroscopy and machine learning to detect nickel contamination in soil: Addressing class imbalances for environmental management\",\"authors\":\"Chongchong Qi , Kechao Li , Min Zhou , Chunhui Zhang , Xiaoming Zheng , Qiusong Chen , Tao Hu\",\"doi\":\"10.1016/j.hazadv.2024.100489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Excessive concentrations of Ni in soil have many severe effects, negatively affecting human health and leading to disease, while also posing a threat to animals and plants. Although the dangers of high Ni concentrations have been widely recognized, rapid and large-scale tools for the identification of Ni contamination are still lacking. Visible-near-infrared (Vis-NIR) spectroscopy has been employed to rapidly identify Ni contamination; however, previous studies suffer from issues inherent to small datasets and the tendency to negate data imbalances. To address these issues, a large dataset comprising 18,675 soil samples was used to predict soil Ni contamination by combining Vis-NIR data with machine learning (ML). The data imbalance inherent to previous studies was addressed using two data sampling methods. To build a robust classification model for Ni contamination, four spectral preprocessing methods and four ML algorithms were compared. The optimal extreme gradient boosting model achieved recall, accuracy, area under the curve, and geometric mean scores of 0.8203, 0.8806, 0.9268, and 0.8508, respectively. Model predictions across the United States identified specific regions with high possibility of Ni contamination. Overall, the model developed in this study offers an improved accuracy in predicting soil Ni contamination at the continental scale, and can be used to prioritize further testing and guide policymaking.</div></div>\",\"PeriodicalId\":73763,\"journal\":{\"name\":\"Journal of hazardous materials advances\",\"volume\":\"16 \",\"pages\":\"Article 100489\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of hazardous materials advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772416624000901\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of hazardous materials advances","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772416624000901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
摘要
土壤中过高浓度的镍会产生许多严重影响,对人类健康造成负面影响并导致疾病,同时还会对动物和植物构成威胁。尽管人们已普遍认识到高浓度镍的危害,但仍缺乏快速、大规模的镍污染识别工具。可见光-近红外(Vis-NIR)光谱法已被用于快速识别镍污染;然而,以往的研究受到小数据集固有问题和数据不平衡倾向的影响。为了解决这些问题,我们使用了一个由 18,675 个土壤样本组成的大型数据集,通过将可见光-近红外数据与机器学习(ML)相结合来预测土壤镍污染。利用两种数据采样方法解决了以往研究中固有的数据不平衡问题。为了建立一个可靠的镍污染分类模型,比较了四种光谱预处理方法和四种 ML 算法。最佳极端梯度提升模型的召回率、准确率、曲线下面积和几何平均得分分别为 0.8203、0.8806、0.9268 和 0.8508。模型对美国各地的预测确定了镍污染可能性较高的特定区域。总体而言,本研究开发的模型提高了预测大陆范围土壤镍污染的准确性,可用于确定进一步测试的优先次序并指导政策制定。
Leveraging visible-near-infrared spectroscopy and machine learning to detect nickel contamination in soil: Addressing class imbalances for environmental management
Excessive concentrations of Ni in soil have many severe effects, negatively affecting human health and leading to disease, while also posing a threat to animals and plants. Although the dangers of high Ni concentrations have been widely recognized, rapid and large-scale tools for the identification of Ni contamination are still lacking. Visible-near-infrared (Vis-NIR) spectroscopy has been employed to rapidly identify Ni contamination; however, previous studies suffer from issues inherent to small datasets and the tendency to negate data imbalances. To address these issues, a large dataset comprising 18,675 soil samples was used to predict soil Ni contamination by combining Vis-NIR data with machine learning (ML). The data imbalance inherent to previous studies was addressed using two data sampling methods. To build a robust classification model for Ni contamination, four spectral preprocessing methods and four ML algorithms were compared. The optimal extreme gradient boosting model achieved recall, accuracy, area under the curve, and geometric mean scores of 0.8203, 0.8806, 0.9268, and 0.8508, respectively. Model predictions across the United States identified specific regions with high possibility of Ni contamination. Overall, the model developed in this study offers an improved accuracy in predicting soil Ni contamination at the continental scale, and can be used to prioritize further testing and guide policymaking.