{"title":"A feature engineering technique for enhancing the generalization of machine learning models in estimating crop evapotranspiration","authors":"Gaku Yokoyama , Sohta Harigai , Shigehiro Kubota , Koichi Nomura , Gregory R. Goldsmith , Daisuke Yasutake , Tomoyoshi Hirota , Masaharu Kitano","doi":"10.1016/j.agwat.2025.109854","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate and precise estimation of evapotranspiration (<em>ET</em>) is crucial for understanding the terrestrial carbon, water, and energy cycles. While process-based models of <em>ET</em>, such as the Penman–Monteith model offer robust generalization capabilities, they are limited by the need for detailed parameters (<em>e.g.</em>, stomatal conductance,) that are challenging to measure continuously. On the other hand, machine learning models can estimate <em>ET</em> by capturing relationships between <em>ET</em> and environmental variables without experimentally measuring model parameters. However, machine learning models face the challenge of limited generalizability. This issue is particularly significant given the uncertainty introduced by changing climatic conditions, which can restrict the model's predictive performance when it is applied to different environmental contexts. Therefore, we propose a hybrid modeling approach that combines feature engineering using process-based models with machine learning to improve generalizability while maintaining practicality. Our model first converts environmental variables into leaf-scale <em>ET</em> using mechanistic process-based models and then uses these features along with the leaf area index to estimate the canopy-scale <em>ET</em> using an artificial neural network (ANN). We evaluated the generalization of the hybrid model against a pure ANN model using FLUXNET2015 data. Results show that the hybrid model significantly outperformed the pure ANN model, especially when tested on data beyond the range of the training dataset. Furthermore, the estimation accuracy of the hybrid model was stable even when the values of the model parameters in the process-based models used for feature engineering were varied by ±50 %. This indicates that incorporating a mechanistic understanding of plant environmental responses enhances the generalizability and robustness of <em>ET</em> predictions. These findings underscore the potential of hybrid models to combine the strengths of process-based and machine learning approaches.</div></div>","PeriodicalId":7634,"journal":{"name":"Agricultural Water Management","volume":"320 ","pages":"Article 109854"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agricultural Water Management","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378377425005682","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate and precise estimation of evapotranspiration (ET) is crucial for understanding the terrestrial carbon, water, and energy cycles. While process-based models of ET, such as the Penman–Monteith model offer robust generalization capabilities, they are limited by the need for detailed parameters (e.g., stomatal conductance,) that are challenging to measure continuously. On the other hand, machine learning models can estimate ET by capturing relationships between ET and environmental variables without experimentally measuring model parameters. However, machine learning models face the challenge of limited generalizability. This issue is particularly significant given the uncertainty introduced by changing climatic conditions, which can restrict the model's predictive performance when it is applied to different environmental contexts. Therefore, we propose a hybrid modeling approach that combines feature engineering using process-based models with machine learning to improve generalizability while maintaining practicality. Our model first converts environmental variables into leaf-scale ET using mechanistic process-based models and then uses these features along with the leaf area index to estimate the canopy-scale ET using an artificial neural network (ANN). We evaluated the generalization of the hybrid model against a pure ANN model using FLUXNET2015 data. Results show that the hybrid model significantly outperformed the pure ANN model, especially when tested on data beyond the range of the training dataset. Furthermore, the estimation accuracy of the hybrid model was stable even when the values of the model parameters in the process-based models used for feature engineering were varied by ±50 %. This indicates that incorporating a mechanistic understanding of plant environmental responses enhances the generalizability and robustness of ET predictions. These findings underscore the potential of hybrid models to combine the strengths of process-based and machine learning approaches.
期刊介绍:
Agricultural Water Management publishes papers of international significance relating to the science, economics, and policy of agricultural water management. In all cases, manuscripts must address implications and provide insight regarding agricultural water management.