Jiankai Hu , Lin Hu , Shu Gan , Xiping Yuan , Jie Li , Hailong Zhao , Yingtao Qi , Chengzhuo Lu
{"title":"Hyperspectral quantitative retrieving of soil iron oxide and Zn content combining feature selection and machine learning algorithms","authors":"Jiankai Hu , Lin Hu , Shu Gan , Xiping Yuan , Jie Li , Hailong Zhao , Yingtao Qi , Chengzhuo Lu","doi":"10.1016/j.saa.2025.126612","DOIUrl":null,"url":null,"abstract":"<div><div>Hyperspectral reflectance provides a pathway for estimating soil iron oxide and heavy metal zinc(Zn) content. The method and process for retrieving soil physicochemical properties from soil reflectance spectra mainly include spectral preprocessing—feature wavelength selection—machine learning modeling. To find the optimal model combination, this study first applies conventional spectral transformations (Continuum Removal, CR; Standard Normal Variate, SNV; First Derivative, FD and Second Derivative, SD) to the original soil spectra, then uses competitive adaptive reweighted sampling (CARS) and the Boruta algorithm to select sensitive bands, and finally constructs four machine learning models (Partial Least Squares Regression, PLSR; Support Vector Machine, SVM; Back Propagation Neural Network, BPNN and Extreme Gradient Boosting, XGBoost). The results show that spectral transformations (CR、SNV、FD and SD) can reduce the interference of external environments on soil spectra, effectively highlighting absorption and reflection features in the spectral curve, thus improving the accuracy of feature band selection and the prediction accuracy of the model. Among the feature selection methods, CARS is more suitable for soil iron oxide, while Boruta is more suitable for heavy metal Zn. In machine learning methods, both linear and nonlinear models can well explain the relationship between soil iron oxide and spectral reflectance, while the relationship between soil heavy metal Zn and spectral reflectance is nonlinear. The best retrieval model combination for soil iron oxide is FD_CARS_SVM, with R<sub>C</sub><sup>2</sup> = 0.878, RMSE<sub>C</sub> = 4.395, R<sup>2</sup><sub>V</sub> = 0.849, RMSE<sub>V</sub> = 4.478, and RPD<sub>V</sub> = 2.576. The best retrieval model combination for heavy metal zinc is FD_Boruta_XGBoost, with R<sub>C</sub><sup>2</sup> = 0.999, RMSE<sub>C</sub> = 0.102, R<sup>2</sup><sub>V</sub> = 0.682, RMSE<sub>V</sub> = 2.697, and RPD<sub>V</sub> = 1.772.</div></div>","PeriodicalId":433,"journal":{"name":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","volume":"343 ","pages":"Article 126612"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386142525009199","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0
Abstract
Hyperspectral reflectance provides a pathway for estimating soil iron oxide and heavy metal zinc(Zn) content. The method and process for retrieving soil physicochemical properties from soil reflectance spectra mainly include spectral preprocessing—feature wavelength selection—machine learning modeling. To find the optimal model combination, this study first applies conventional spectral transformations (Continuum Removal, CR; Standard Normal Variate, SNV; First Derivative, FD and Second Derivative, SD) to the original soil spectra, then uses competitive adaptive reweighted sampling (CARS) and the Boruta algorithm to select sensitive bands, and finally constructs four machine learning models (Partial Least Squares Regression, PLSR; Support Vector Machine, SVM; Back Propagation Neural Network, BPNN and Extreme Gradient Boosting, XGBoost). The results show that spectral transformations (CR、SNV、FD and SD) can reduce the interference of external environments on soil spectra, effectively highlighting absorption and reflection features in the spectral curve, thus improving the accuracy of feature band selection and the prediction accuracy of the model. Among the feature selection methods, CARS is more suitable for soil iron oxide, while Boruta is more suitable for heavy metal Zn. In machine learning methods, both linear and nonlinear models can well explain the relationship between soil iron oxide and spectral reflectance, while the relationship between soil heavy metal Zn and spectral reflectance is nonlinear. The best retrieval model combination for soil iron oxide is FD_CARS_SVM, with RC2 = 0.878, RMSEC = 4.395, R2V = 0.849, RMSEV = 4.478, and RPDV = 2.576. The best retrieval model combination for heavy metal zinc is FD_Boruta_XGBoost, with RC2 = 0.999, RMSEC = 0.102, R2V = 0.682, RMSEV = 2.697, and RPDV = 1.772.
期刊介绍:
Spectrochimica Acta, Part A: Molecular and Biomolecular Spectroscopy (SAA) is an interdisciplinary journal which spans from basic to applied aspects of optical spectroscopy in chemistry, medicine, biology, and materials science.
The journal publishes original scientific papers that feature high-quality spectroscopic data and analysis. From the broad range of optical spectroscopies, the emphasis is on electronic, vibrational or rotational spectra of molecules, rather than on spectroscopy based on magnetic moments.
Criteria for publication in SAA are novelty, uniqueness, and outstanding quality. Routine applications of spectroscopic techniques and computational methods are not appropriate.
Topics of particular interest of Spectrochimica Acta Part A include, but are not limited to:
Spectroscopy and dynamics of bioanalytical, biomedical, environmental, and atmospheric sciences,
Novel experimental techniques or instrumentation for molecular spectroscopy,
Novel theoretical and computational methods,
Novel applications in photochemistry and photobiology,
Novel interpretational approaches as well as advances in data analysis based on electronic or vibrational spectroscopy.