Mina Rahimi , Masoud Karbasi , Mehdi Jamei , Vahid Rezaverdinejad , Anurag Malik , Aitazaz A. Farooque , Zaher Mundher Yaseen
{"title":"玉米实际蒸散量的精细估算:一种综合可解释的CatBoost算法","authors":"Mina Rahimi , Masoud Karbasi , Mehdi Jamei , Vahid Rezaverdinejad , Anurag Malik , Aitazaz A. Farooque , Zaher Mundher Yaseen","doi":"10.1016/j.compag.2025.110599","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately estimating daily actual evapotranspiration (AET) is essential for managing water resources in irrigated regions. The current study employed a new machine learning technique (CatBoost) to predict maize AET using meteorological and soil-related data. Four benchmark machine learning techniques (Random Forest, Extra Tree, multi-layer perceptron neural network, and K-nearest neighbor) were used for comparison. The lysimeter data of maize AET from Bushland (Texas) in the US were selected to evaluate the performance of the models. The data contained different soil and meteorological parameters. Four different scenarios (comb1: All of the data, comb2: Based on Lasso regression feature selection, comb3: Based on Boruta feature selection algorithm, and comb4: Common meteorological data) were used to predict AET. Various statistical metrics were employed to assess the models’ performance, including the determination coefficient (R<sup>2</sup>) and root mean square error (RMSE). Comparison between different scenarios showed that the Boruta technique improves precision and decreases computation time by reducing the dimension of the input data. The CatBoost model had the best accuracy in all scenarios. The current study showed that the CatBoost algorithm (comb3 scenario) can predict AET with higher accuracy (R<sup>2</sup> = 9.625 × 10<sup>−1</sup> and RMSE = 5.594 × 10<sup>−1</sup> mm/d). Combining the comb3 scenario with extra tree (R<sup>2</sup> = 9.514 × 10<sup>−1</sup> and RMSE = 6.716 × 10<sup>−1</sup> mm/d) and random forest (R<sup>2</sup> = 9.444 × 10<sup>−1</sup> and RMSE = 7.084 × 10<sup>−1</sup> mm/d) models ranked second and third best accuracy. Also, the SHAP analysis was performed to interpret the black-box model outputs. The SHAP analysis showed that net radiation and air temperature are the most important input parameters for AET prediction.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110599"},"PeriodicalIF":8.9000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meticulous estimation of maize actual evapotranspiration: A comprehensive explainable CatBoost algorithm reinforced with Jackknife uncertainty paradigm\",\"authors\":\"Mina Rahimi , Masoud Karbasi , Mehdi Jamei , Vahid Rezaverdinejad , Anurag Malik , Aitazaz A. Farooque , Zaher Mundher Yaseen\",\"doi\":\"10.1016/j.compag.2025.110599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurately estimating daily actual evapotranspiration (AET) is essential for managing water resources in irrigated regions. The current study employed a new machine learning technique (CatBoost) to predict maize AET using meteorological and soil-related data. Four benchmark machine learning techniques (Random Forest, Extra Tree, multi-layer perceptron neural network, and K-nearest neighbor) were used for comparison. The lysimeter data of maize AET from Bushland (Texas) in the US were selected to evaluate the performance of the models. The data contained different soil and meteorological parameters. Four different scenarios (comb1: All of the data, comb2: Based on Lasso regression feature selection, comb3: Based on Boruta feature selection algorithm, and comb4: Common meteorological data) were used to predict AET. Various statistical metrics were employed to assess the models’ performance, including the determination coefficient (R<sup>2</sup>) and root mean square error (RMSE). Comparison between different scenarios showed that the Boruta technique improves precision and decreases computation time by reducing the dimension of the input data. The CatBoost model had the best accuracy in all scenarios. The current study showed that the CatBoost algorithm (comb3 scenario) can predict AET with higher accuracy (R<sup>2</sup> = 9.625 × 10<sup>−1</sup> and RMSE = 5.594 × 10<sup>−1</sup> mm/d). Combining the comb3 scenario with extra tree (R<sup>2</sup> = 9.514 × 10<sup>−1</sup> and RMSE = 6.716 × 10<sup>−1</sup> mm/d) and random forest (R<sup>2</sup> = 9.444 × 10<sup>−1</sup> and RMSE = 7.084 × 10<sup>−1</sup> mm/d) models ranked second and third best accuracy. Also, the SHAP analysis was performed to interpret the black-box model outputs. The SHAP analysis showed that net radiation and air temperature are the most important input parameters for AET prediction.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"237 \",\"pages\":\"Article 110599\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925007057\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925007057","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
Meticulous estimation of maize actual evapotranspiration: A comprehensive explainable CatBoost algorithm reinforced with Jackknife uncertainty paradigm
Accurately estimating daily actual evapotranspiration (AET) is essential for managing water resources in irrigated regions. The current study employed a new machine learning technique (CatBoost) to predict maize AET using meteorological and soil-related data. Four benchmark machine learning techniques (Random Forest, Extra Tree, multi-layer perceptron neural network, and K-nearest neighbor) were used for comparison. The lysimeter data of maize AET from Bushland (Texas) in the US were selected to evaluate the performance of the models. The data contained different soil and meteorological parameters. Four different scenarios (comb1: All of the data, comb2: Based on Lasso regression feature selection, comb3: Based on Boruta feature selection algorithm, and comb4: Common meteorological data) were used to predict AET. Various statistical metrics were employed to assess the models’ performance, including the determination coefficient (R2) and root mean square error (RMSE). Comparison between different scenarios showed that the Boruta technique improves precision and decreases computation time by reducing the dimension of the input data. The CatBoost model had the best accuracy in all scenarios. The current study showed that the CatBoost algorithm (comb3 scenario) can predict AET with higher accuracy (R2 = 9.625 × 10−1 and RMSE = 5.594 × 10−1 mm/d). Combining the comb3 scenario with extra tree (R2 = 9.514 × 10−1 and RMSE = 6.716 × 10−1 mm/d) and random forest (R2 = 9.444 × 10−1 and RMSE = 7.084 × 10−1 mm/d) models ranked second and third best accuracy. Also, the SHAP analysis was performed to interpret the black-box model outputs. The SHAP analysis showed that net radiation and air temperature are the most important input parameters for AET prediction.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.