Variability and uncertainty in net ecosystem carbon exchange modeling: Systematic estimates at global flux sites via ensemble machine learning

IF 5.7 1区农林科学 Q1 AGRONOMY

Agricultural and Forest Meteorology Pub Date : 2025-08-21 DOI:10.1016/j.agrformet.2025.110784

Nannan Wang , Zijian Yue , Yaolin Liu , Zhaomin Tong , Yanfang Liu , Yanchi Lu , Yongge Shi

{"title":"Variability and uncertainty in net ecosystem carbon exchange modeling: Systematic estimates at global flux sites via ensemble machine learning","authors":"Nannan Wang , Zijian Yue , Yaolin Liu , Zhaomin Tong , Yanfang Liu , Yanchi Lu , Yongge Shi","doi":"10.1016/j.agrformet.2025.110784","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting net ecosystem carbon exchange (NEE) is crucial for understanding carbon dynamics. Machine learning (ML) has become pivotal for site-level modeling and spatial upscaling for NEE, yet spatiotemporal variability and uncertainty challenge its reliability and universality. Systematically quantifying variability and uncertainty sources in NEE modeling remains lacking due to the scale-dependent nature of carbon flux variations. Thus, this study established a systematic framework to evaluate how model construction choices and environmental predictors could impact ML-based NEE modeling across timescales with multifaceted evaluation criteria. Using observations from FLUXNET 2015, AmeriFlux, and ICOS, alongside multi-source data, this study conducted separate models for each combination of four timescales (daily, weekly, monthly, and yearly), four tree-based ensemble algorithms, and three data-splitting rules. Multi-faceted assessment included overall, across-site, seasonal, and anomaly perspectives. Key findings include: (1) <em>Model construction.</em> Boosting (LightGBM, XGBoost, and CatBoost) excelled in capturing temporal variability and anomaly, whereas bagging (Random Forest) was effective for spatial variability. Complete-random data splitting increased overfitting risks and should be avoided. (2) <em>Predictors.</em> Environmental controls on accuracy varied with timescales, data situations, and ambient conditions. Predictors for NEE modeling should be selected based on their causal importance (e.g., evapotranspiration, vapor pressure deficit, and air temperature) and statistical relationships (e.g., leaf area index, elevation, and precipitation) with NEE, tailored to specific ambient conditions. Excessive predictors may degrade NEE prediction accuracy, particularly at large scales or in regions with high environment like arid areas. (3) <em>Evaluation criteria.</em> Rigorous multi-metric accuracy assessments proved essential, as reliance on single metrics or overall accuracy could yield contradictory results. For instance, daily models achieved higher anomaly NSE (0.33 vs. 0.25) but lower overall NSE (0.54 vs. 0.59) than monthly models. NEE predictions exhibited greater challenges in accounting for spatial than temporal variability, resulting in lower accuracy for inter-annual than intra-annual predictions. This study advances ML-driven carbon flux modeling with actionable insights.</div></div>","PeriodicalId":50839,"journal":{"name":"Agricultural and Forest Meteorology","volume":"374 ","pages":"Article 110784"},"PeriodicalIF":5.7000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agricultural and Forest Meteorology","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168192325004034","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}

引用次数: 0

Abstract

Predicting net ecosystem carbon exchange (NEE) is crucial for understanding carbon dynamics. Machine learning (ML) has become pivotal for site-level modeling and spatial upscaling for NEE, yet spatiotemporal variability and uncertainty challenge its reliability and universality. Systematically quantifying variability and uncertainty sources in NEE modeling remains lacking due to the scale-dependent nature of carbon flux variations. Thus, this study established a systematic framework to evaluate how model construction choices and environmental predictors could impact ML-based NEE modeling across timescales with multifaceted evaluation criteria. Using observations from FLUXNET 2015, AmeriFlux, and ICOS, alongside multi-source data, this study conducted separate models for each combination of four timescales (daily, weekly, monthly, and yearly), four tree-based ensemble algorithms, and three data-splitting rules. Multi-faceted assessment included overall, across-site, seasonal, and anomaly perspectives. Key findings include: (1) Model construction. Boosting (LightGBM, XGBoost, and CatBoost) excelled in capturing temporal variability and anomaly, whereas bagging (Random Forest) was effective for spatial variability. Complete-random data splitting increased overfitting risks and should be avoided. (2) Predictors. Environmental controls on accuracy varied with timescales, data situations, and ambient conditions. Predictors for NEE modeling should be selected based on their causal importance (e.g., evapotranspiration, vapor pressure deficit, and air temperature) and statistical relationships (e.g., leaf area index, elevation, and precipitation) with NEE, tailored to specific ambient conditions. Excessive predictors may degrade NEE prediction accuracy, particularly at large scales or in regions with high environment like arid areas. (3) Evaluation criteria. Rigorous multi-metric accuracy assessments proved essential, as reliance on single metrics or overall accuracy could yield contradictory results. For instance, daily models achieved higher anomaly NSE (0.33 vs. 0.25) but lower overall NSE (0.54 vs. 0.59) than monthly models. NEE predictions exhibited greater challenges in accounting for spatial than temporal variability, resulting in lower accuracy for inter-annual than intra-annual predictions. This study advances ML-driven carbon flux modeling with actionable insights.

查看原文本刊更多论文

净生态系统碳交换模型的变异性和不确定性：通过集成机器学习对全球通量站点的系统估计

预测生态系统净碳交换（NEE）是理解生态系统碳动态的关键。机器学习（ML）已成为新能源电气站点级建模和空间升级的关键，但时空变异性和不确定性挑战了其可靠性和普遍性。由于碳通量变化的尺度依赖性，在新能源经济模型中仍缺乏系统量化变率和不确定性源。因此，本研究建立了一个系统的框架，以多方面的评估标准来评估模型构建选择和环境预测因子如何跨时间尺度影响基于ml的新能源经济建模。利用FLUXNET 2015、AmeriFlux和ICOS的观测数据以及多源数据，本研究针对四个时间尺度（每日、每周、每月和每年）的每种组合、四种基于树的集成算法和三种数据分割规则建立了单独的模型。多方面的评估包括总体、跨站点、季节性和异常角度。主要发现包括：(1)模型构建。boost （LightGBM， XGBoost和CatBoost）在捕获时间变异性和异常方面表现出色，而bagging （Random Forest）在捕获空间变异性方面表现有效。完全随机数据分割增加了过拟合风险，应避免。(2)预测。环境对准确性的控制随时间尺度、数据情况和环境条件而变化。应该根据NEE的因果重要性（如蒸散发、蒸汽压差和气温）和与NEE的统计关系（如叶面积指数、海拔和降水）来选择NEE建模的预测因子，并根据特定的环境条件进行定制。过多的预测器可能会降低新东北电网的预测精度，特别是在大尺度或干旱等高环境地区。(3)评价标准。严格的多度量精度评估被证明是必要的，因为依赖单一度量或总体精度可能产生相互矛盾的结果。例如，与月模型相比，日模型获得了更高的异常NSE (0.33 vs. 0.25)，但总体NSE较低（0.54 vs. 0.59）。NEE预测在考虑空间变异性方面比考虑时间变异性面临更大的挑战，导致年际预测的准确性低于年内预测。这项研究以可行的见解推进了机器学习驱动的碳通量建模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Agricultural and Forest Meteorology 农林科学-林学

CiteScore

10.30

自引率

9.70%

发文量

415

审稿时长

69 days

期刊介绍： Agricultural and Forest Meteorology is an international journal for the publication of original articles and reviews on the inter-relationship between meteorology, agriculture, forestry, and natural ecosystems. Emphasis is on basic and applied scientific research relevant to practical problems in the field of plant and soil sciences, ecology and biogeochemistry as affected by weather as well as climate variability and change. Theoretical models should be tested against experimental data. Articles must appeal to an international audience. Special issues devoted to single topics are also published. Typical topics include canopy micrometeorology (e.g. canopy radiation transfer, turbulence near the ground, evapotranspiration, energy balance, fluxes of trace gases), micrometeorological instrumentation (e.g., sensors for trace gases, flux measurement instruments, radiation measurement techniques), aerobiology (e.g. the dispersion of pollen, spores, insects and pesticides), biometeorology (e.g. the effect of weather and climate on plant distribution, crop yield, water-use efficiency, and plant phenology), forest-fire/weather interactions, and feedbacks from vegetation to weather and the climate system.