Effectiveness of Integrating Ensemble-Based Feature Selection and Novel Gradient Boosted Trees in Runoff Prediction: A Case Study in Vu Gia Thu Bon River Basin, Vietnam
{"title":"Effectiveness of Integrating Ensemble-Based Feature Selection and Novel Gradient Boosted Trees in Runoff Prediction: A Case Study in Vu Gia Thu Bon River Basin, Vietnam","authors":"Oluwatobi Aiyelokun, Quoc Bao Pham, Oluwafunbi Aiyelokun, Nguyen Thi Thuy Linh, Tirthankar Roy, Duong Tran Anh, Ewa Łupikasza","doi":"10.1007/s00024-024-03486-0","DOIUrl":null,"url":null,"abstract":"<div><p>Traditional rainfall-runoff modeling techniques require large datasets and often an exhaustive calibration process, which is challenging, especially in poorly-gauged basins and resource-limited settings. Therefore, it is necessary to examine new ways of constructing predictive models for runoff that can achieve satisfactory results, while also minimizing the data requirement and model construction time. In this study, the effectiveness of integrating the Random Forest (RF) as an important feature identifier with novel gradient boosted trees to achieve satisfactory results was examined for two adjacent catchments in Vietnam. Antecedent daily runoff in combination with daily and one-day antecedent rainfall was found to significantly influence the runoff at the outlet of the catchments. Categorical Boosting (CatBoost) and Extreme Gradient Boosting (XGBoost) were effective in predicting day-ahead runoff. For instance, CatBoost with NSE, d, r, and R<sup>2</sup> values of 0.92, 0.98, 0.96, and 0.92, respectively, and XGBoost with NSE, d, r, and R<sup>2</sup> values of 0.91, 0.98, 0.96, and 0.92, respectively, are well suited for predicting runoff. A comparative analysis of their results with previous studies revealed that the models were very effective since they were able to better reduce generalization errors at different calibration and validation phases. This study presents the integration of RF and gradient boosted trees as a simplified alternative to computationally expensive and data-intensive physically-based rainfall-runoff models. The practitioners can build upon the experimentation presented in this study to minimize the computational time requirement, construction process complexity, and data requirement, which are often serious constraints in physically-based rainfall-runoff modeling.</p></div>","PeriodicalId":21078,"journal":{"name":"pure and applied geophysics","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"pure and applied geophysics","FirstCategoryId":"89","ListUrlMain":"https://link.springer.com/article/10.1007/s00024-024-03486-0","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional rainfall-runoff modeling techniques require large datasets and often an exhaustive calibration process, which is challenging, especially in poorly-gauged basins and resource-limited settings. Therefore, it is necessary to examine new ways of constructing predictive models for runoff that can achieve satisfactory results, while also minimizing the data requirement and model construction time. In this study, the effectiveness of integrating the Random Forest (RF) as an important feature identifier with novel gradient boosted trees to achieve satisfactory results was examined for two adjacent catchments in Vietnam. Antecedent daily runoff in combination with daily and one-day antecedent rainfall was found to significantly influence the runoff at the outlet of the catchments. Categorical Boosting (CatBoost) and Extreme Gradient Boosting (XGBoost) were effective in predicting day-ahead runoff. For instance, CatBoost with NSE, d, r, and R2 values of 0.92, 0.98, 0.96, and 0.92, respectively, and XGBoost with NSE, d, r, and R2 values of 0.91, 0.98, 0.96, and 0.92, respectively, are well suited for predicting runoff. A comparative analysis of their results with previous studies revealed that the models were very effective since they were able to better reduce generalization errors at different calibration and validation phases. This study presents the integration of RF and gradient boosted trees as a simplified alternative to computationally expensive and data-intensive physically-based rainfall-runoff models. The practitioners can build upon the experimentation presented in this study to minimize the computational time requirement, construction process complexity, and data requirement, which are often serious constraints in physically-based rainfall-runoff modeling.
期刊介绍:
pure and applied geophysics (pageoph), a continuation of the journal "Geofisica pura e applicata", publishes original scientific contributions in the fields of solid Earth, atmospheric and oceanic sciences. Regular and special issues feature thought-provoking reports on active areas of current research and state-of-the-art surveys.
Long running journal, founded in 1939 as Geofisica pura e applicata
Publishes peer-reviewed original scientific contributions and state-of-the-art surveys in solid earth and atmospheric sciences
Features thought-provoking reports on active areas of current research and is a major source for publications on tsunami research
Coverage extends to research topics in oceanic sciences
See Instructions for Authors on the right hand side.