结合特征选择和机器学习算法的土壤氧化铁和锌含量高光谱定量检索

IF 4.6 2区 化学 Q1 SPECTROSCOPY
Jiankai Hu , Lin Hu , Shu Gan , Xiping Yuan , Jie Li , Hailong Zhao , Yingtao Qi , Chengzhuo Lu
{"title":"结合特征选择和机器学习算法的土壤氧化铁和锌含量高光谱定量检索","authors":"Jiankai Hu ,&nbsp;Lin Hu ,&nbsp;Shu Gan ,&nbsp;Xiping Yuan ,&nbsp;Jie Li ,&nbsp;Hailong Zhao ,&nbsp;Yingtao Qi ,&nbsp;Chengzhuo Lu","doi":"10.1016/j.saa.2025.126612","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperspectral reflectance provides a pathway for estimating soil iron oxide and heavy metal zinc(Zn) content. The method and process for retrieving soil physicochemical properties from soil reflectance spectra mainly include spectral preprocessing—feature wavelength selection—machine learning modeling. To find the optimal model combination, this study first applies conventional spectral transformations (Continuum Removal, CR; Standard Normal Variate, SNV; First Derivative, FD and Second Derivative, SD) to the original soil spectra, then uses competitive adaptive reweighted sampling (CARS) and the Boruta algorithm to select sensitive bands, and finally constructs four machine learning models (Partial Least Squares Regression, PLSR; Support Vector Machine, SVM; Back Propagation Neural Network, BPNN and Extreme Gradient Boosting, XGBoost). The results show that spectral transformations (CR、SNV、FD and SD) can reduce the interference of external environments on soil spectra, effectively highlighting absorption and reflection features in the spectral curve, thus improving the accuracy of feature band selection and the prediction accuracy of the model. Among the feature selection methods, CARS is more suitable for soil iron oxide, while Boruta is more suitable for heavy metal Zn. In machine learning methods, both linear and nonlinear models can well explain the relationship between soil iron oxide and spectral reflectance, while the relationship between soil heavy metal Zn and spectral reflectance is nonlinear. The best retrieval model combination for soil iron oxide is FD_CARS_SVM, with R<sub>C</sub><sup>2</sup> = 0.878, RMSE<sub>C</sub> = 4.395, R<sup>2</sup><sub>V</sub> = 0.849, RMSE<sub>V</sub> = 4.478, and RPD<sub>V</sub> = 2.576. The best retrieval model combination for heavy metal zinc is FD_Boruta_XGBoost, with R<sub>C</sub><sup>2</sup> = 0.999, RMSE<sub>C</sub> = 0.102, R<sup>2</sup><sub>V</sub> = 0.682, RMSE<sub>V</sub> = 2.697, and RPD<sub>V</sub> = 1.772.</div></div>","PeriodicalId":433,"journal":{"name":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","volume":"343 ","pages":"Article 126612"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hyperspectral quantitative retrieving of soil iron oxide and Zn content combining feature selection and machine learning algorithms\",\"authors\":\"Jiankai Hu ,&nbsp;Lin Hu ,&nbsp;Shu Gan ,&nbsp;Xiping Yuan ,&nbsp;Jie Li ,&nbsp;Hailong Zhao ,&nbsp;Yingtao Qi ,&nbsp;Chengzhuo Lu\",\"doi\":\"10.1016/j.saa.2025.126612\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hyperspectral reflectance provides a pathway for estimating soil iron oxide and heavy metal zinc(Zn) content. The method and process for retrieving soil physicochemical properties from soil reflectance spectra mainly include spectral preprocessing—feature wavelength selection—machine learning modeling. To find the optimal model combination, this study first applies conventional spectral transformations (Continuum Removal, CR; Standard Normal Variate, SNV; First Derivative, FD and Second Derivative, SD) to the original soil spectra, then uses competitive adaptive reweighted sampling (CARS) and the Boruta algorithm to select sensitive bands, and finally constructs four machine learning models (Partial Least Squares Regression, PLSR; Support Vector Machine, SVM; Back Propagation Neural Network, BPNN and Extreme Gradient Boosting, XGBoost). The results show that spectral transformations (CR、SNV、FD and SD) can reduce the interference of external environments on soil spectra, effectively highlighting absorption and reflection features in the spectral curve, thus improving the accuracy of feature band selection and the prediction accuracy of the model. Among the feature selection methods, CARS is more suitable for soil iron oxide, while Boruta is more suitable for heavy metal Zn. In machine learning methods, both linear and nonlinear models can well explain the relationship between soil iron oxide and spectral reflectance, while the relationship between soil heavy metal Zn and spectral reflectance is nonlinear. The best retrieval model combination for soil iron oxide is FD_CARS_SVM, with R<sub>C</sub><sup>2</sup> = 0.878, RMSE<sub>C</sub> = 4.395, R<sup>2</sup><sub>V</sub> = 0.849, RMSE<sub>V</sub> = 4.478, and RPD<sub>V</sub> = 2.576. The best retrieval model combination for heavy metal zinc is FD_Boruta_XGBoost, with R<sub>C</sub><sup>2</sup> = 0.999, RMSE<sub>C</sub> = 0.102, R<sup>2</sup><sub>V</sub> = 0.682, RMSE<sub>V</sub> = 2.697, and RPD<sub>V</sub> = 1.772.</div></div>\",\"PeriodicalId\":433,\"journal\":{\"name\":\"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy\",\"volume\":\"343 \",\"pages\":\"Article 126612\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386142525009199\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SPECTROSCOPY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386142525009199","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0

摘要

高光谱反射率为估算土壤氧化铁和重金属锌含量提供了途径。从土壤反射光谱中提取土壤理化性质的方法和过程主要包括光谱预处理-特征波长选择-机器学习建模。为了找到最优的模型组合,本研究首先采用常规的光谱变换(Continuum Removal, CR;标准正态变量;对原始土壤光谱进行一阶导数,FD和二阶导数,SD),然后使用竞争自适应重加权采样(CARS)和Boruta算法选择敏感波段,最后构建4个机器学习模型(偏最小二乘回归,PLSR;支持向量机;反向传播神经网络,BPNN和极端梯度增强,XGBoost)。结果表明,光谱变换(CR、SNV、FD和SD)可以减少外界环境对土壤光谱的干扰,有效突出光谱曲线中的吸收和反射特征,从而提高特征波段选择的精度和模型的预测精度。在特征选择方法中,CARS更适合土壤氧化铁,Boruta更适合重金属Zn。在机器学习方法中,线性和非线性模型都能很好地解释土壤氧化铁与光谱反射率的关系,而土壤重金属Zn与光谱反射率的关系是非线性的。土壤氧化铁的最佳检索模型组合为FD_CARS_SVM, RC2 = 0.878, RMSEC = 4.395, R2V = 0.849, RMSEV = 4.478, RPDV = 2.576。重金属锌的最佳检索模型组合为FD_Boruta_XGBoost, RC2 = 0.999, RMSEC = 0.102, R2V = 0.682, RMSEV = 2.697, RPDV = 1.772。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Hyperspectral quantitative retrieving of soil iron oxide and Zn content combining feature selection and machine learning algorithms

Hyperspectral quantitative retrieving of soil iron oxide and Zn content combining feature selection and machine learning algorithms
Hyperspectral reflectance provides a pathway for estimating soil iron oxide and heavy metal zinc(Zn) content. The method and process for retrieving soil physicochemical properties from soil reflectance spectra mainly include spectral preprocessing—feature wavelength selection—machine learning modeling. To find the optimal model combination, this study first applies conventional spectral transformations (Continuum Removal, CR; Standard Normal Variate, SNV; First Derivative, FD and Second Derivative, SD) to the original soil spectra, then uses competitive adaptive reweighted sampling (CARS) and the Boruta algorithm to select sensitive bands, and finally constructs four machine learning models (Partial Least Squares Regression, PLSR; Support Vector Machine, SVM; Back Propagation Neural Network, BPNN and Extreme Gradient Boosting, XGBoost). The results show that spectral transformations (CR、SNV、FD and SD) can reduce the interference of external environments on soil spectra, effectively highlighting absorption and reflection features in the spectral curve, thus improving the accuracy of feature band selection and the prediction accuracy of the model. Among the feature selection methods, CARS is more suitable for soil iron oxide, while Boruta is more suitable for heavy metal Zn. In machine learning methods, both linear and nonlinear models can well explain the relationship between soil iron oxide and spectral reflectance, while the relationship between soil heavy metal Zn and spectral reflectance is nonlinear. The best retrieval model combination for soil iron oxide is FD_CARS_SVM, with RC2 = 0.878, RMSEC = 4.395, R2V = 0.849, RMSEV = 4.478, and RPDV = 2.576. The best retrieval model combination for heavy metal zinc is FD_Boruta_XGBoost, with RC2 = 0.999, RMSEC = 0.102, R2V = 0.682, RMSEV = 2.697, and RPDV = 1.772.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.40
自引率
11.40%
发文量
1364
审稿时长
40 days
期刊介绍: Spectrochimica Acta, Part A: Molecular and Biomolecular Spectroscopy (SAA) is an interdisciplinary journal which spans from basic to applied aspects of optical spectroscopy in chemistry, medicine, biology, and materials science. The journal publishes original scientific papers that feature high-quality spectroscopic data and analysis. From the broad range of optical spectroscopies, the emphasis is on electronic, vibrational or rotational spectra of molecules, rather than on spectroscopy based on magnetic moments. Criteria for publication in SAA are novelty, uniqueness, and outstanding quality. Routine applications of spectroscopic techniques and computational methods are not appropriate. Topics of particular interest of Spectrochimica Acta Part A include, but are not limited to: Spectroscopy and dynamics of bioanalytical, biomedical, environmental, and atmospheric sciences, Novel experimental techniques or instrumentation for molecular spectroscopy, Novel theoretical and computational methods, Novel applications in photochemistry and photobiology, Novel interpretational approaches as well as advances in data analysis based on electronic or vibrational spectroscopy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信