玉米实际蒸散量的精细估算:一种综合可解释的CatBoost算法

IF 8.9 1区 农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY
Mina Rahimi , Masoud Karbasi , Mehdi Jamei , Vahid Rezaverdinejad , Anurag Malik , Aitazaz A. Farooque , Zaher Mundher Yaseen
{"title":"玉米实际蒸散量的精细估算:一种综合可解释的CatBoost算法","authors":"Mina Rahimi ,&nbsp;Masoud Karbasi ,&nbsp;Mehdi Jamei ,&nbsp;Vahid Rezaverdinejad ,&nbsp;Anurag Malik ,&nbsp;Aitazaz A. Farooque ,&nbsp;Zaher Mundher Yaseen","doi":"10.1016/j.compag.2025.110599","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately estimating daily actual evapotranspiration (AET) is essential for managing water resources in irrigated regions. The current study employed a new machine learning technique (CatBoost) to predict maize AET using meteorological and soil-related data. Four benchmark machine learning techniques (Random Forest, Extra Tree, multi-layer perceptron neural network, and K-nearest neighbor) were used for comparison. The lysimeter data of maize AET from Bushland (Texas) in the US were selected to evaluate the performance of the models. The data contained different soil and meteorological parameters. Four different scenarios (comb1: All of the data, comb2: Based on Lasso regression feature selection, comb3: Based on Boruta feature selection algorithm, and comb4: Common meteorological data) were used to predict AET. Various statistical metrics were employed to assess the models’ performance, including the determination coefficient (R<sup>2</sup>) and root mean square error (RMSE). Comparison between different scenarios showed that the Boruta technique improves precision and decreases computation time by reducing the dimension of the input data. The CatBoost model had the best accuracy in all scenarios. The current study showed that the CatBoost algorithm (comb3 scenario) can predict AET with higher accuracy (R<sup>2</sup> = 9.625 × 10<sup>−1</sup> and RMSE = 5.594 × 10<sup>−1</sup> mm/d). Combining the comb3 scenario with extra tree (R<sup>2</sup> = 9.514 × 10<sup>−1</sup> and RMSE = 6.716 × 10<sup>−1</sup> mm/d) and random forest (R<sup>2</sup> = 9.444 × 10<sup>−1</sup> and RMSE = 7.084 × 10<sup>−1</sup> mm/d) models ranked second and third best accuracy. Also, the SHAP analysis was performed to interpret the black-box model outputs. The SHAP analysis showed that net radiation and air temperature are the most important input parameters for AET prediction.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110599"},"PeriodicalIF":8.9000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meticulous estimation of maize actual evapotranspiration: A comprehensive explainable CatBoost algorithm reinforced with Jackknife uncertainty paradigm\",\"authors\":\"Mina Rahimi ,&nbsp;Masoud Karbasi ,&nbsp;Mehdi Jamei ,&nbsp;Vahid Rezaverdinejad ,&nbsp;Anurag Malik ,&nbsp;Aitazaz A. Farooque ,&nbsp;Zaher Mundher Yaseen\",\"doi\":\"10.1016/j.compag.2025.110599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurately estimating daily actual evapotranspiration (AET) is essential for managing water resources in irrigated regions. The current study employed a new machine learning technique (CatBoost) to predict maize AET using meteorological and soil-related data. Four benchmark machine learning techniques (Random Forest, Extra Tree, multi-layer perceptron neural network, and K-nearest neighbor) were used for comparison. The lysimeter data of maize AET from Bushland (Texas) in the US were selected to evaluate the performance of the models. The data contained different soil and meteorological parameters. Four different scenarios (comb1: All of the data, comb2: Based on Lasso regression feature selection, comb3: Based on Boruta feature selection algorithm, and comb4: Common meteorological data) were used to predict AET. Various statistical metrics were employed to assess the models’ performance, including the determination coefficient (R<sup>2</sup>) and root mean square error (RMSE). Comparison between different scenarios showed that the Boruta technique improves precision and decreases computation time by reducing the dimension of the input data. The CatBoost model had the best accuracy in all scenarios. The current study showed that the CatBoost algorithm (comb3 scenario) can predict AET with higher accuracy (R<sup>2</sup> = 9.625 × 10<sup>−1</sup> and RMSE = 5.594 × 10<sup>−1</sup> mm/d). Combining the comb3 scenario with extra tree (R<sup>2</sup> = 9.514 × 10<sup>−1</sup> and RMSE = 6.716 × 10<sup>−1</sup> mm/d) and random forest (R<sup>2</sup> = 9.444 × 10<sup>−1</sup> and RMSE = 7.084 × 10<sup>−1</sup> mm/d) models ranked second and third best accuracy. Also, the SHAP analysis was performed to interpret the black-box model outputs. The SHAP analysis showed that net radiation and air temperature are the most important input parameters for AET prediction.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"237 \",\"pages\":\"Article 110599\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925007057\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925007057","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

准确估算日实际蒸散量对灌区水资源管理具有重要意义。目前的研究采用了一种新的机器学习技术(CatBoost),利用气象和土壤相关数据预测玉米AET。使用四种基准机器学习技术(随机森林、额外树、多层感知器神经网络和k近邻)进行比较。选取美国德克萨斯州Bushland地区玉米AET的蒸渗仪数据,对模型的性能进行了评价。这些数据包含不同的土壤和气象参数。使用comb1:所有数据、comb2:基于Lasso回归特征选择、comb3:基于Boruta特征选择算法、comb4:普通气象数据4种不同场景对AET进行预测。采用各种统计指标评估模型的性能,包括决定系数(R2)和均方根误差(RMSE)。不同场景的对比表明,Boruta技术通过降低输入数据的维数,提高了精度,减少了计算时间。CatBoost模型在所有情况下都具有最好的准确性。目前的研究表明,CatBoost算法(comb3场景)能够以较高的精度预测AET (R2 = 9.625 × 10−1,RMSE = 5.594 × 10−1 mm/d)。组合comb3场景与额外树(R2 = 9.514 × 10−1,RMSE = 6.716 × 10−1 mm/d)和随机森林(R2 = 9.444 × 10−1,RMSE = 7.084 × 10−1 mm/d)模型的精度排名第二和第三。此外,还进行了SHAP分析来解释黑箱模型的输出。SHAP分析表明,净辐射和气温是AET预报最重要的输入参数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Meticulous estimation of maize actual evapotranspiration: A comprehensive explainable CatBoost algorithm reinforced with Jackknife uncertainty paradigm
Accurately estimating daily actual evapotranspiration (AET) is essential for managing water resources in irrigated regions. The current study employed a new machine learning technique (CatBoost) to predict maize AET using meteorological and soil-related data. Four benchmark machine learning techniques (Random Forest, Extra Tree, multi-layer perceptron neural network, and K-nearest neighbor) were used for comparison. The lysimeter data of maize AET from Bushland (Texas) in the US were selected to evaluate the performance of the models. The data contained different soil and meteorological parameters. Four different scenarios (comb1: All of the data, comb2: Based on Lasso regression feature selection, comb3: Based on Boruta feature selection algorithm, and comb4: Common meteorological data) were used to predict AET. Various statistical metrics were employed to assess the models’ performance, including the determination coefficient (R2) and root mean square error (RMSE). Comparison between different scenarios showed that the Boruta technique improves precision and decreases computation time by reducing the dimension of the input data. The CatBoost model had the best accuracy in all scenarios. The current study showed that the CatBoost algorithm (comb3 scenario) can predict AET with higher accuracy (R2 = 9.625 × 10−1 and RMSE = 5.594 × 10−1 mm/d). Combining the comb3 scenario with extra tree (R2 = 9.514 × 10−1 and RMSE = 6.716 × 10−1 mm/d) and random forest (R2 = 9.444 × 10−1 and RMSE = 7.084 × 10−1 mm/d) models ranked second and third best accuracy. Also, the SHAP analysis was performed to interpret the black-box model outputs. The SHAP analysis showed that net radiation and air temperature are the most important input parameters for AET prediction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers and Electronics in Agriculture
Computers and Electronics in Agriculture 工程技术-计算机:跨学科应用
CiteScore
15.30
自引率
14.50%
发文量
800
审稿时长
62 days
期刊介绍: Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信