{"title":"Estimating daily reference evapotranspiration with reduced data input using ensemble learning models in arid and humid regions of China","authors":"Qi Wei , Qi Wei , Junzeng Xu , Peng Chen , Shengyu Chen , Zihao Liu , Wenhao Qian , Zhiheng Huang , Jingyi Ren , Haoxuan Wang , Yimin Ding , Chao Lei , Zhiming Qi","doi":"10.1016/j.compag.2025.110548","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate estimation of reference evapotranspiration (ET<sub>o</sub>) is key to irrigation system design and agricultural water management. Utilizing meteorological data (1960–2019) from 20 stations in China’s humid and arid regions, a reference ET<sub>o</sub> value was calculated using the FAO56-Penman-Monteith (PM) method. The accuracy of 6 ensemble learning models [<em>e.g</em>., Adaptive boosting (AdaBoost), Gradient Boosting Decision Tree (GBDT), Categorical boosting (CatBoost), Extreme gradient boosting (XGBoost), Extra trees, and Light Gradient Boosting Method (LightGBM)] in estimating daily ET<sub>o</sub> using all available inputs was investigated. The performance of the best three models (CatBoost, GBDT and XGBoost) was then evaluated under 7 input combinations [<em>i.e</em>., complete and incomplete combinations of maximum and minimum temperature (T<sub>max</sub> and T<sub>min</sub>), relative humidity (RH), wind speed (U<sub>2</sub>), total and extra-terrestrial solar radiation (R<sub>s</sub> and R<sub>a</sub>)], and 4 dataset sizes (20, 30, 40 and 60 years). CatBoost showed the highest estimation accuracy (average R<sup>2</sup> = 0.93), stability, and robustness. Using incomplete combinations based on temperature and other indicators to estimate daily ET<sub>o</sub> also achieved satisfactory results (R<sup>2</sup> > 0.91), and the key indicators contributing to a difference in ET<sub>o</sub> prediction accuracy between humid and arid regions were RH and R<sub>a</sub>. Different models’ accuracy in estimating daily ET<sub>o</sub> was not affected by dataset size (the difference of RMSE<0.025), but its stability improves with the increase of the dataset. This study evaluated the models’ performance under different data constraints and different regional applications, which provides a methodological reference for ET<sub>o</sub> simulation in global multiclimatic zones, takes into account accuracy and practicality.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110548"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925006544","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate estimation of reference evapotranspiration (ETo) is key to irrigation system design and agricultural water management. Utilizing meteorological data (1960–2019) from 20 stations in China’s humid and arid regions, a reference ETo value was calculated using the FAO56-Penman-Monteith (PM) method. The accuracy of 6 ensemble learning models [e.g., Adaptive boosting (AdaBoost), Gradient Boosting Decision Tree (GBDT), Categorical boosting (CatBoost), Extreme gradient boosting (XGBoost), Extra trees, and Light Gradient Boosting Method (LightGBM)] in estimating daily ETo using all available inputs was investigated. The performance of the best three models (CatBoost, GBDT and XGBoost) was then evaluated under 7 input combinations [i.e., complete and incomplete combinations of maximum and minimum temperature (Tmax and Tmin), relative humidity (RH), wind speed (U2), total and extra-terrestrial solar radiation (Rs and Ra)], and 4 dataset sizes (20, 30, 40 and 60 years). CatBoost showed the highest estimation accuracy (average R2 = 0.93), stability, and robustness. Using incomplete combinations based on temperature and other indicators to estimate daily ETo also achieved satisfactory results (R2 > 0.91), and the key indicators contributing to a difference in ETo prediction accuracy between humid and arid regions were RH and Ra. Different models’ accuracy in estimating daily ETo was not affected by dataset size (the difference of RMSE<0.025), but its stability improves with the increase of the dataset. This study evaluated the models’ performance under different data constraints and different regional applications, which provides a methodological reference for ETo simulation in global multiclimatic zones, takes into account accuracy and practicality.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.