{"title":"Recursive feature elimination for summer wheat leaf area index using ensemble algorithm-based modeling: The case of central Highland of Ethiopia","authors":"Dereje Biru , Berhan Gessesse , Gebeyehu Abebe","doi":"10.1016/j.envc.2025.101113","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate and nondestructive monitoring of the wheat leaf area index (LAI) is important for effective agricultural management and production forecasting. However, building a high-performance predictive model faces challenges in selecting suitable machine learning algorithms and identifying important variables. This study explored the use of ensemble algorithm-based recursive feature elimination (RFE) for summer wheat LAI estimation using the Google Earth Engine (GEE) cloud computing platform. Remote sensing datasets, including Sentinel-1/2 and digital elevation models, encompassing spectral bands, vegetation indices, texture metrics, and topographic variables, were used. The preprocessing stage involved creating 136 independent variables in the GEE, whereas the LAI data were collected from 84 systematically selected samples using the ACCUPAR LP-80 Ceptometer. Further processing included feature combination, min–max normalization, extraction of the 136 independent variables to the LAI data, and data partitioning for training and testing. The RFE algorithm was applied using the random forest (RF) and gradient tree boost (GTB) algorithms within the GEE to predict the summer wheat LAI at the Lole State Farm. Model performance validation analysis was evaluated via R-squared (R<sup>2</sup>), root mean squared error (RMSE), mean squared error (MSE), and mean absolute error (MAE) statistical models. The results indicated that 49 significant variables were selected for the RFE-RF model, whereas 29 were chosen for the RFE-GTB model. The GTB model outperformed the RF model, achieving R<sup>2</sup> values of 0.968 for training and 0.88 for validation, whereas the R<sup>2</sup> values of the RF model were 0.961 for training and 0.856 for validation. The GTB model also exhibited superior accuracy, with lower RMSE, MSE, and MAE values. Additionally, a predicted LAI map for summer wheat was generated, ranging from 0.22-2.12 for the random forest model and from 0.24-2.23 for the gradient tree boost model. Overall, this study demonstrated the improvement of the learning algorithm by identifying important variables, evaluating its performance in predicting wheat LAI, and generating a map of the predicted LAI. The results offer valuable insights for the nondestructive and rapid acquisition of summer wheat LAI by employing an ensemble algorithm-based RFE and utilizing Earth observation data in the GEE.</div></div>","PeriodicalId":34794,"journal":{"name":"Environmental Challenges","volume":"19 ","pages":"Article 101113"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Challenges","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667010025000332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Environmental Science","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and nondestructive monitoring of the wheat leaf area index (LAI) is important for effective agricultural management and production forecasting. However, building a high-performance predictive model faces challenges in selecting suitable machine learning algorithms and identifying important variables. This study explored the use of ensemble algorithm-based recursive feature elimination (RFE) for summer wheat LAI estimation using the Google Earth Engine (GEE) cloud computing platform. Remote sensing datasets, including Sentinel-1/2 and digital elevation models, encompassing spectral bands, vegetation indices, texture metrics, and topographic variables, were used. The preprocessing stage involved creating 136 independent variables in the GEE, whereas the LAI data were collected from 84 systematically selected samples using the ACCUPAR LP-80 Ceptometer. Further processing included feature combination, min–max normalization, extraction of the 136 independent variables to the LAI data, and data partitioning for training and testing. The RFE algorithm was applied using the random forest (RF) and gradient tree boost (GTB) algorithms within the GEE to predict the summer wheat LAI at the Lole State Farm. Model performance validation analysis was evaluated via R-squared (R2), root mean squared error (RMSE), mean squared error (MSE), and mean absolute error (MAE) statistical models. The results indicated that 49 significant variables were selected for the RFE-RF model, whereas 29 were chosen for the RFE-GTB model. The GTB model outperformed the RF model, achieving R2 values of 0.968 for training and 0.88 for validation, whereas the R2 values of the RF model were 0.961 for training and 0.856 for validation. The GTB model also exhibited superior accuracy, with lower RMSE, MSE, and MAE values. Additionally, a predicted LAI map for summer wheat was generated, ranging from 0.22-2.12 for the random forest model and from 0.24-2.23 for the gradient tree boost model. Overall, this study demonstrated the improvement of the learning algorithm by identifying important variables, evaluating its performance in predicting wheat LAI, and generating a map of the predicted LAI. The results offer valuable insights for the nondestructive and rapid acquisition of summer wheat LAI by employing an ensemble algorithm-based RFE and utilizing Earth observation data in the GEE.