A feature engineering technique for enhancing the generalization of machine learning models in estimating crop evapotranspiration

IF 6.5 1区 农林科学 Q1 AGRONOMY
Gaku Yokoyama , Sohta Harigai , Shigehiro Kubota , Koichi Nomura , Gregory R. Goldsmith , Daisuke Yasutake , Tomoyoshi Hirota , Masaharu Kitano
{"title":"A feature engineering technique for enhancing the generalization of machine learning models in estimating crop evapotranspiration","authors":"Gaku Yokoyama ,&nbsp;Sohta Harigai ,&nbsp;Shigehiro Kubota ,&nbsp;Koichi Nomura ,&nbsp;Gregory R. Goldsmith ,&nbsp;Daisuke Yasutake ,&nbsp;Tomoyoshi Hirota ,&nbsp;Masaharu Kitano","doi":"10.1016/j.agwat.2025.109854","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate and precise estimation of evapotranspiration (<em>ET</em>) is crucial for understanding the terrestrial carbon, water, and energy cycles. While process-based models of <em>ET</em>, such as the Penman–Monteith model offer robust generalization capabilities, they are limited by the need for detailed parameters (<em>e.g.</em>, stomatal conductance,) that are challenging to measure continuously. On the other hand, machine learning models can estimate <em>ET</em> by capturing relationships between <em>ET</em> and environmental variables without experimentally measuring model parameters. However, machine learning models face the challenge of limited generalizability. This issue is particularly significant given the uncertainty introduced by changing climatic conditions, which can restrict the model's predictive performance when it is applied to different environmental contexts. Therefore, we propose a hybrid modeling approach that combines feature engineering using process-based models with machine learning to improve generalizability while maintaining practicality. Our model first converts environmental variables into leaf-scale <em>ET</em> using mechanistic process-based models and then uses these features along with the leaf area index to estimate the canopy-scale <em>ET</em> using an artificial neural network (ANN). We evaluated the generalization of the hybrid model against a pure ANN model using FLUXNET2015 data. Results show that the hybrid model significantly outperformed the pure ANN model, especially when tested on data beyond the range of the training dataset. Furthermore, the estimation accuracy of the hybrid model was stable even when the values of the model parameters in the process-based models used for feature engineering were varied by ±50 %. This indicates that incorporating a mechanistic understanding of plant environmental responses enhances the generalizability and robustness of <em>ET</em> predictions. These findings underscore the potential of hybrid models to combine the strengths of process-based and machine learning approaches.</div></div>","PeriodicalId":7634,"journal":{"name":"Agricultural Water Management","volume":"320 ","pages":"Article 109854"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agricultural Water Management","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378377425005682","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate and precise estimation of evapotranspiration (ET) is crucial for understanding the terrestrial carbon, water, and energy cycles. While process-based models of ET, such as the Penman–Monteith model offer robust generalization capabilities, they are limited by the need for detailed parameters (e.g., stomatal conductance,) that are challenging to measure continuously. On the other hand, machine learning models can estimate ET by capturing relationships between ET and environmental variables without experimentally measuring model parameters. However, machine learning models face the challenge of limited generalizability. This issue is particularly significant given the uncertainty introduced by changing climatic conditions, which can restrict the model's predictive performance when it is applied to different environmental contexts. Therefore, we propose a hybrid modeling approach that combines feature engineering using process-based models with machine learning to improve generalizability while maintaining practicality. Our model first converts environmental variables into leaf-scale ET using mechanistic process-based models and then uses these features along with the leaf area index to estimate the canopy-scale ET using an artificial neural network (ANN). We evaluated the generalization of the hybrid model against a pure ANN model using FLUXNET2015 data. Results show that the hybrid model significantly outperformed the pure ANN model, especially when tested on data beyond the range of the training dataset. Furthermore, the estimation accuracy of the hybrid model was stable even when the values of the model parameters in the process-based models used for feature engineering were varied by ±50 %. This indicates that incorporating a mechanistic understanding of plant environmental responses enhances the generalizability and robustness of ET predictions. These findings underscore the potential of hybrid models to combine the strengths of process-based and machine learning approaches.
一种增强作物蒸散估算中机器学习模型泛化能力的特征工程技术
准确估算蒸散发(ET)对于理解陆地碳、水和能量循环至关重要。虽然基于过程的蒸散发模型,如Penman-Monteith模型提供了强大的泛化能力,但它们受到对详细参数(如气孔导度)的需求的限制,这些参数难以连续测量。另一方面,机器学习模型可以通过捕获ET与环境变量之间的关系来估计ET,而无需通过实验测量模型参数。然而,机器学习模型面临着泛化能力有限的挑战。考虑到气候条件变化带来的不确定性,这一问题尤为重要,当模型应用于不同的环境背景时,气候条件变化会限制模型的预测性能。因此,我们提出了一种混合建模方法,将使用基于过程的模型的特征工程与机器学习相结合,以提高通用性,同时保持实用性。我们的模型首先使用基于机制过程的模型将环境变量转换为叶尺度的ET,然后使用这些特征与叶面积指数一起使用人工神经网络(ANN)估计冠层尺度的ET。我们使用FLUXNET2015数据评估了混合模型与纯人工神经网络模型的泛化效果。结果表明,混合模型明显优于纯人工神经网络模型,特别是在训练数据集范围之外的数据上进行测试时。此外,即使用于特征工程的基于过程的模型的模型参数值发生±50 %的变化,混合模型的估计精度也是稳定的。这表明,结合对植物环境响应的机制理解可以提高ET预测的普遍性和稳健性。这些发现强调了混合模型结合基于过程和机器学习方法优势的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Agricultural Water Management
Agricultural Water Management 农林科学-农艺学
CiteScore
12.10
自引率
14.90%
发文量
648
审稿时长
4.9 months
期刊介绍: Agricultural Water Management publishes papers of international significance relating to the science, economics, and policy of agricultural water management. In all cases, manuscripts must address implications and provide insight regarding agricultural water management.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信