Junjie Liu , Yongsheng Hong , Bifeng Hu , Songchao Chen , Jia Deng , Keyang Yin , Jiao Lin , Defang Luo , Jie Peng , Zhou Shi
{"title":"Hyperspectral inversion of soil organic matter based on improved ensemble learning method","authors":"Junjie Liu , Yongsheng Hong , Bifeng Hu , Songchao Chen , Jia Deng , Keyang Yin , Jiao Lin , Defang Luo , Jie Peng , Zhou Shi","doi":"10.1016/j.saa.2025.126302","DOIUrl":null,"url":null,"abstract":"<div><div>Soil organic matter (SOM) is a vital component of soil, and its rapid and accurate detection is crucial for ensuring land health and stabilizing atmospheric carbon dioxide levels. Soil hyperspectroscopy has demonstrated its efficiency and cost-effectiveness as a method for detecting SOM. In the field of soil spectroscopy, the Ensemble Model (EM) holds substantial promise due to its robust nature and strong generalization capabilities. However, the efficacy of EM is largely contingent upon the judicious selection of the base learner count and the strategic allocation of weights. Traditional practices is mainly relying on empirical weight distribution or a singular index, <em>R<sup>2</sup></em>, of the base learners, with scant clarity on the optimal base learner count for varying ensemble techniques. To address this gap, our study utilizes Vis-NIR spectroscopy to quantitatively assess SOM across 704 samples from the Tarim River Basin in Xinjiang, China. Our objective is to innovate base learner weight assignment methods and identify the differing optimal counts of EM base learners, thereby refining the ensemble approach and augmenting EM performance. Subsequently, we examined the impact of various weight coefficient assignment methods and base learner counts on EM performance within Weighted Averaging (WA), Blending, and Stacking frameworks. Our findings reveal that a weight coefficient assignment method incorporating <em>R<sup>2</sup></em>, <em>RMSE</em>, and <em>MAE</em> significantly enhances EM performance. This improvement surpasses traditional methods relying solely on base learner <em>R<sup>2</sup></em>, yielding an increased EM <em>R<sup>2</sup></em> of 0.006–0.024, with reductions in <em>RMSE</em> and <em>MAE</em> by 0.014–0.085 g kg<sup>−1</sup> and 0.03–0.085 g kg<sup>−1</sup>, respectively. Though the number of base learners is crucial, it does not establish a linear relationship; an increase does not invariably translate to enhanced performance. Notably, when the base learner count is 12, Blending and Stacking exhibit peak performance, whereas WA’s precision continues to ascend with 15 base learners. Among the ensemble methods, Stacking demonstrates the highest precision, achieving a validation <em>R<sup>2</sup></em> of 0.889, <em>RMSE</em> of 0.957 g kg<sup>−1</sup>, and <em>MAE</em> of 0.803 g kg<sup>−1</sup>. In summary, configuring the base learner count to 12 and employing a multi-index comprehensive evaluation for weight assignment within the Stacking method emerges as the optimal integration strategy for SOM hyperspectral inversion.</div></div>","PeriodicalId":433,"journal":{"name":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","volume":"339 ","pages":"Article 126302"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386142525006080","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0
Abstract
Soil organic matter (SOM) is a vital component of soil, and its rapid and accurate detection is crucial for ensuring land health and stabilizing atmospheric carbon dioxide levels. Soil hyperspectroscopy has demonstrated its efficiency and cost-effectiveness as a method for detecting SOM. In the field of soil spectroscopy, the Ensemble Model (EM) holds substantial promise due to its robust nature and strong generalization capabilities. However, the efficacy of EM is largely contingent upon the judicious selection of the base learner count and the strategic allocation of weights. Traditional practices is mainly relying on empirical weight distribution or a singular index, R2, of the base learners, with scant clarity on the optimal base learner count for varying ensemble techniques. To address this gap, our study utilizes Vis-NIR spectroscopy to quantitatively assess SOM across 704 samples from the Tarim River Basin in Xinjiang, China. Our objective is to innovate base learner weight assignment methods and identify the differing optimal counts of EM base learners, thereby refining the ensemble approach and augmenting EM performance. Subsequently, we examined the impact of various weight coefficient assignment methods and base learner counts on EM performance within Weighted Averaging (WA), Blending, and Stacking frameworks. Our findings reveal that a weight coefficient assignment method incorporating R2, RMSE, and MAE significantly enhances EM performance. This improvement surpasses traditional methods relying solely on base learner R2, yielding an increased EM R2 of 0.006–0.024, with reductions in RMSE and MAE by 0.014–0.085 g kg−1 and 0.03–0.085 g kg−1, respectively. Though the number of base learners is crucial, it does not establish a linear relationship; an increase does not invariably translate to enhanced performance. Notably, when the base learner count is 12, Blending and Stacking exhibit peak performance, whereas WA’s precision continues to ascend with 15 base learners. Among the ensemble methods, Stacking demonstrates the highest precision, achieving a validation R2 of 0.889, RMSE of 0.957 g kg−1, and MAE of 0.803 g kg−1. In summary, configuring the base learner count to 12 and employing a multi-index comprehensive evaluation for weight assignment within the Stacking method emerges as the optimal integration strategy for SOM hyperspectral inversion.
土壤有机质(SOM)是土壤的重要组成部分,其快速准确的检测对于确保土地健康和稳定大气二氧化碳水平至关重要。土壤超光谱学已经证明了它作为一种检测SOM的方法的效率和成本效益。在土壤光谱学领域,集合模型(Ensemble Model, EM)以其鲁棒性和较强的泛化能力而具有广阔的应用前景。然而,EM的有效性在很大程度上取决于基本学习者数量的明智选择和权重的战略性分配。传统的实践主要依赖于基础学习器的经验权重分布或单一指数R2,对不同集成技术的最佳基础学习器计数缺乏明确的了解。为了解决这一差距,我们的研究利用可见光-近红外光谱对中国新疆塔里木河流域704个样品的SOM进行了定量评估。我们的目标是创新基础学习器权重分配方法,并确定EM基础学习器的不同最佳计数,从而改进集成方法并提高EM性能。随后,我们研究了在加权平均(WA)、混合和堆叠框架中,各种权重系数分配方法和基本学习者计数对EM性能的影响。我们的研究结果表明,结合R2、RMSE和MAE的权重系数分配方法显著提高了EM的性能。这种改进超过了仅依赖基础学习器R2的传统方法,EM R2增加了0.006-0.024,RMSE和MAE分别降低了0.014-0.085 g kg - 1和0.03-0.085 g kg - 1。虽然基础学习器的数量是至关重要的,但它并没有建立线性关系;增加并不一定转化为性能的提高。值得注意的是,当基本学习器数量为12时,混合和堆叠表现出最高的性能,而华盛顿州的精度继续上升,有15个基本学习器。其中,Stacking方法精度最高,验证R2为0.889,RMSE为0.957 g kg - 1, MAE为0.803 g kg - 1。综上所述,将基本学习器数量配置为12,并在堆叠方法中采用多指标综合评价来分配权重,是SOM高光谱反演的最优集成策略。
期刊介绍:
Spectrochimica Acta, Part A: Molecular and Biomolecular Spectroscopy (SAA) is an interdisciplinary journal which spans from basic to applied aspects of optical spectroscopy in chemistry, medicine, biology, and materials science.
The journal publishes original scientific papers that feature high-quality spectroscopic data and analysis. From the broad range of optical spectroscopies, the emphasis is on electronic, vibrational or rotational spectra of molecules, rather than on spectroscopy based on magnetic moments.
Criteria for publication in SAA are novelty, uniqueness, and outstanding quality. Routine applications of spectroscopic techniques and computational methods are not appropriate.
Topics of particular interest of Spectrochimica Acta Part A include, but are not limited to:
Spectroscopy and dynamics of bioanalytical, biomedical, environmental, and atmospheric sciences,
Novel experimental techniques or instrumentation for molecular spectroscopy,
Novel theoretical and computational methods,
Novel applications in photochemistry and photobiology,
Novel interpretational approaches as well as advances in data analysis based on electronic or vibrational spectroscopy.