Yujie Liu , Benjamin Lucas , Darby D. Bergl , Andrew D. Richardson
{"title":"Robust filling of extra-long gaps in eddy covariance CO2 flux measurements from a temperate deciduous forest using eXtreme Gradient Boosting","authors":"Yujie Liu , Benjamin Lucas , Darby D. Bergl , Andrew D. Richardson","doi":"10.1016/j.agrformet.2025.110438","DOIUrl":null,"url":null,"abstract":"<div><div>Eddy Covariance measurements are often subject to missing values, or gaps in the data record. Methods to fill short gaps are well-established, but robustly filling gaps longer than a few weeks remains a challenge. Marginal Distribution Sampling (MDS) is a standard gap-filling method, but its effectiveness for long gaps (> 30 days) is limited. We compared the performance of a machine learning algorithm, eXtreme Gradient Boosting (XGB) against MDS, using various artificial scenarios of gap lengths and locations. We gapfilled half hourly CO<sub>2</sub> flux from a temperate deciduous forest, Bartlett Experimental Forest, from 2010 to 2022. Whereas the standard implementation of MDS uses a narrowly-prescribed set of predictor variables, with XGB we were able to include additional variables. The Green Chromatic Coordinate (GCC), derived from PhenoCam imagery, and diffuse photosynthetic photon flux density, emerged as two of the three most important predictor variables. Compared to MDS, the root mean square error (RMSE) of XGB decreased by 9.5 %, and the R<sup>2</sup> increased by 2.7 % in a randomized 10-fold cross validation test. XGB outperformed MDS for both day and night times across different seasons. But annual NEE integrals varied across methods, with weaker annual net carbon uptake, by -110 ± 74 g C m<sup>-2</sup> y<sup>-1</sup> for XGB compared to MDS (214 ± 11 g C m<sup>-2</sup> yr<sup>-1</sup>). In artificial gap experiments, when trained using the 13-year data record, XGB reliably filled gaps, showing little change in RMSE for gaps up to 240 days. In contrast, the performance of MDS steadily decreased as gap lengths increased. MDS was unable to fill gaps longer than 2 months. In summary, XGB demonstrates excellent performance as an alternative method to MDS, providing reliable predictions for temperate deciduous forest carbon fluxes under different gap lengths and location scenarios. Implementation of XGB is facilitated by easy-to-use packages.</div></div>","PeriodicalId":50839,"journal":{"name":"Agricultural and Forest Meteorology","volume":"364 ","pages":"Article 110438"},"PeriodicalIF":5.6000,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Agricultural and Forest Meteorology","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168192325000589","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
Eddy Covariance measurements are often subject to missing values, or gaps in the data record. Methods to fill short gaps are well-established, but robustly filling gaps longer than a few weeks remains a challenge. Marginal Distribution Sampling (MDS) is a standard gap-filling method, but its effectiveness for long gaps (> 30 days) is limited. We compared the performance of a machine learning algorithm, eXtreme Gradient Boosting (XGB) against MDS, using various artificial scenarios of gap lengths and locations. We gapfilled half hourly CO2 flux from a temperate deciduous forest, Bartlett Experimental Forest, from 2010 to 2022. Whereas the standard implementation of MDS uses a narrowly-prescribed set of predictor variables, with XGB we were able to include additional variables. The Green Chromatic Coordinate (GCC), derived from PhenoCam imagery, and diffuse photosynthetic photon flux density, emerged as two of the three most important predictor variables. Compared to MDS, the root mean square error (RMSE) of XGB decreased by 9.5 %, and the R2 increased by 2.7 % in a randomized 10-fold cross validation test. XGB outperformed MDS for both day and night times across different seasons. But annual NEE integrals varied across methods, with weaker annual net carbon uptake, by -110 ± 74 g C m-2 y-1 for XGB compared to MDS (214 ± 11 g C m-2 yr-1). In artificial gap experiments, when trained using the 13-year data record, XGB reliably filled gaps, showing little change in RMSE for gaps up to 240 days. In contrast, the performance of MDS steadily decreased as gap lengths increased. MDS was unable to fill gaps longer than 2 months. In summary, XGB demonstrates excellent performance as an alternative method to MDS, providing reliable predictions for temperate deciduous forest carbon fluxes under different gap lengths and location scenarios. Implementation of XGB is facilitated by easy-to-use packages.
涡旋相关测量经常受到缺失值或数据记录中的空白的影响。填补短期空白的方法已经确立,但要想有力地填补超过几周的空白仍然是一个挑战。边际分布抽样(MDS)是一种标准的空白填充方法,但其对长空白(>;30天)是有限的。我们比较了机器学习算法的性能,极限梯度增强(XGB)与MDS,使用各种人工场景的间隙长度和位置。从2010年到2022年,我们对温带落叶森林巴特利特实验森林的半小时二氧化碳通量进行了空白。MDS的标准实现使用一组规定得很窄的预测变量,而使用XGB,我们能够包括额外的变量。来自PhenoCam图像的绿色色度坐标(GCC)和漫射光合作用光子通量密度成为三个最重要的预测变量中的两个。与MDS相比,XGB的均方根误差(RMSE)降低了9.5%,R2增加了2.7%。XGB在不同季节的白天和夜晚都优于MDS。但不同方法的年度NEE积分不同,与MDS(214±11 g C - m-2 -1)相比,XGB的年净碳吸收量较弱,为-110±74 g C - m-2 -1。在人工间隙实验中,当使用13年的数据记录进行训练时,XGB可靠地填补了间隙,在长达240天的间隙中RMSE几乎没有变化。相反,随着间隙长度的增加,MDS的性能稳步下降。MDS无法填补超过2个月的空白。综上所述,XGB作为MDS的替代方法具有优异的性能,可为不同林隙长度和不同位置情景下的温带落叶森林碳通量提供可靠的预测。易于使用的软件包促进了XGB的实现。
期刊介绍:
Agricultural and Forest Meteorology is an international journal for the publication of original articles and reviews on the inter-relationship between meteorology, agriculture, forestry, and natural ecosystems. Emphasis is on basic and applied scientific research relevant to practical problems in the field of plant and soil sciences, ecology and biogeochemistry as affected by weather as well as climate variability and change. Theoretical models should be tested against experimental data. Articles must appeal to an international audience. Special issues devoted to single topics are also published.
Typical topics include canopy micrometeorology (e.g. canopy radiation transfer, turbulence near the ground, evapotranspiration, energy balance, fluxes of trace gases), micrometeorological instrumentation (e.g., sensors for trace gases, flux measurement instruments, radiation measurement techniques), aerobiology (e.g. the dispersion of pollen, spores, insects and pesticides), biometeorology (e.g. the effect of weather and climate on plant distribution, crop yield, water-use efficiency, and plant phenology), forest-fire/weather interactions, and feedbacks from vegetation to weather and the climate system.