Qingchen Xu , Lu Li , Zhongwang Wei , Xingjie Lu , Nan Wei , Xuhui Lee , Yongjiu Dai
{"title":"A multimodal machine learning fused global 0.1° daily evapotranspiration dataset from 1950-2022","authors":"Qingchen Xu , Lu Li , Zhongwang Wei , Xingjie Lu , Nan Wei , Xuhui Lee , Yongjiu Dai","doi":"10.1016/j.agrformet.2025.110645","DOIUrl":null,"url":null,"abstract":"<div><div>Evapotranspiration (ET) is the second largest hydrological flux over the land surface and connects water, energy, and carbon cycles. However, large uncertainties exist among current ET products due to their coarse spatial resolutions, short temporal coverages, and reliance on assumptions. This study introduces a multimodal machine learning framework to generate a high-resolution (0.1°, daily), long-term (1950–2022) global ET dataset by fusing 13 state-of-the-art ET products encompassing remote sensing, machine learning, land surface models, and reanalysis data relying on extensive flux tower observations (462 sites). The framework reconstructs the individual ET products to consistent spatiotemporal resolutions and time ranges using Light Gradient Boosting Machine (LightGBM) models, and the Automated Machine Learning (AutoML) technique was used to fuse ET using 13 reconstructed ET products, ERA5-land atmospheric forcings and ancillary data as predictors. In-situ observations are utilized for model training and validation. Results demonstrate significant improvements over existing datasets, with our product achieving the highest accuracy (KGE = 0.857, RMSE = 0.726 mm/day) against in situ measurements across ecosystems and regions. The fused ET dataset realistically captures spatiotemporal variability and corrects the systematic underestimation bias prevalent in other datasets, particularly in wet regions. This novel high spatial-temporal ET dataset enables more robust assessments for water, energy, and carbon cycle applications on regional hydrology and ecology. The introduced data integration methodology also provides a valuable framework for fusing multiple geoscience datasets with disparate properties.</div></div>","PeriodicalId":50839,"journal":{"name":"Agricultural and Forest Meteorology","volume":"372 ","pages":"Article 110645"},"PeriodicalIF":5.7000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agricultural and Forest Meteorology","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168192325002655","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
Evapotranspiration (ET) is the second largest hydrological flux over the land surface and connects water, energy, and carbon cycles. However, large uncertainties exist among current ET products due to their coarse spatial resolutions, short temporal coverages, and reliance on assumptions. This study introduces a multimodal machine learning framework to generate a high-resolution (0.1°, daily), long-term (1950–2022) global ET dataset by fusing 13 state-of-the-art ET products encompassing remote sensing, machine learning, land surface models, and reanalysis data relying on extensive flux tower observations (462 sites). The framework reconstructs the individual ET products to consistent spatiotemporal resolutions and time ranges using Light Gradient Boosting Machine (LightGBM) models, and the Automated Machine Learning (AutoML) technique was used to fuse ET using 13 reconstructed ET products, ERA5-land atmospheric forcings and ancillary data as predictors. In-situ observations are utilized for model training and validation. Results demonstrate significant improvements over existing datasets, with our product achieving the highest accuracy (KGE = 0.857, RMSE = 0.726 mm/day) against in situ measurements across ecosystems and regions. The fused ET dataset realistically captures spatiotemporal variability and corrects the systematic underestimation bias prevalent in other datasets, particularly in wet regions. This novel high spatial-temporal ET dataset enables more robust assessments for water, energy, and carbon cycle applications on regional hydrology and ecology. The introduced data integration methodology also provides a valuable framework for fusing multiple geoscience datasets with disparate properties.
期刊介绍:
Agricultural and Forest Meteorology is an international journal for the publication of original articles and reviews on the inter-relationship between meteorology, agriculture, forestry, and natural ecosystems. Emphasis is on basic and applied scientific research relevant to practical problems in the field of plant and soil sciences, ecology and biogeochemistry as affected by weather as well as climate variability and change. Theoretical models should be tested against experimental data. Articles must appeal to an international audience. Special issues devoted to single topics are also published.
Typical topics include canopy micrometeorology (e.g. canopy radiation transfer, turbulence near the ground, evapotranspiration, energy balance, fluxes of trace gases), micrometeorological instrumentation (e.g., sensors for trace gases, flux measurement instruments, radiation measurement techniques), aerobiology (e.g. the dispersion of pollen, spores, insects and pesticides), biometeorology (e.g. the effect of weather and climate on plant distribution, crop yield, water-use efficiency, and plant phenology), forest-fire/weather interactions, and feedbacks from vegetation to weather and the climate system.