L. T. Keetz, K. Aalstad, R. A. Fisher, C. Poppe Terán, B. Naz, N. Pirk, Y. A. Yilmaz, O. Skarpaas
{"title":"Inferring Parameters in a Complex Land Surface Model by Combining Data Assimilation and Machine Learning","authors":"L. T. Keetz, K. Aalstad, R. A. Fisher, C. Poppe Terán, B. Naz, N. Pirk, Y. A. Yilmaz, O. Skarpaas","doi":"10.1029/2024MS004542","DOIUrl":null,"url":null,"abstract":"<p>Complex Land Surface Models (LSMs) rely on a plethora of parameters. These parameters and the associated process formulations are often poorly constrained, which hampers reliable predictions of ecosystem dynamics and climate feedbacks. Robust and uncertainty-aware parameter estimation with observations is complicated by, for example, the high dimensionality of the model parameter space and the computational cost of LSM simulations. Herein, we adapt a novel Bayesian data assimilation (DA) and machine learning framework termed “calibrate, emulate, sample” (CES) to infer parameters in a widely-used LSM coupled with a demographic vegetation model (CLM-FATES). First, an iterative ensemble Kalman smoother provides an initial estimate of the posterior distribution (“calibrate”). Subsequently, a machine-learning-based emulator is trained on the resulting model-observation mismatches to predict outcomes for unseen parameter combinations (“emulate”). Finally, this emulator replaces CLM-FATES simulations in an adaptive Markov Chain Monte Carlo approach enabling computationally feasible posterior sampling with enhanced uncertainty quantification (“sample”). We test our implementation with synthetic and real observations representing a boreal forest site in southern Finland. We estimate a total of six plant-functional-type-specific photosynthetic parameters by assimilating evapotranspiration (ET) and gross primary production (GPP) flux data. CES provided the best estimates of the synthetic truth parameters when compared to data-blind emulator sampling designs while all approaches reduced model-observation errors compared to a default parameter simulation (GPP: <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>10</mn>\n </mrow>\n <annotation> ${-}10$</annotation>\n </semantics></math>% to <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>30</mn>\n </mrow>\n <annotation> ${-}30$</annotation>\n </semantics></math>%, ET: <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>4</mn>\n </mrow>\n <annotation> ${-}4$</annotation>\n </semantics></math>% to <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>6</mn>\n </mrow>\n <annotation> ${-}6$</annotation>\n </semantics></math>%). Although errors were also consistently reduced with real data, comparing the emulator designs was less conclusive, which we mainly attribute to equifinality, structural uncertainty within CLM-FATES, and/or unknown errors in the data that are not accounted for.</p>","PeriodicalId":14881,"journal":{"name":"Journal of Advances in Modeling Earth Systems","volume":"17 6","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1029/2024MS004542","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Modeling Earth Systems","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1029/2024MS004542","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Complex Land Surface Models (LSMs) rely on a plethora of parameters. These parameters and the associated process formulations are often poorly constrained, which hampers reliable predictions of ecosystem dynamics and climate feedbacks. Robust and uncertainty-aware parameter estimation with observations is complicated by, for example, the high dimensionality of the model parameter space and the computational cost of LSM simulations. Herein, we adapt a novel Bayesian data assimilation (DA) and machine learning framework termed “calibrate, emulate, sample” (CES) to infer parameters in a widely-used LSM coupled with a demographic vegetation model (CLM-FATES). First, an iterative ensemble Kalman smoother provides an initial estimate of the posterior distribution (“calibrate”). Subsequently, a machine-learning-based emulator is trained on the resulting model-observation mismatches to predict outcomes for unseen parameter combinations (“emulate”). Finally, this emulator replaces CLM-FATES simulations in an adaptive Markov Chain Monte Carlo approach enabling computationally feasible posterior sampling with enhanced uncertainty quantification (“sample”). We test our implementation with synthetic and real observations representing a boreal forest site in southern Finland. We estimate a total of six plant-functional-type-specific photosynthetic parameters by assimilating evapotranspiration (ET) and gross primary production (GPP) flux data. CES provided the best estimates of the synthetic truth parameters when compared to data-blind emulator sampling designs while all approaches reduced model-observation errors compared to a default parameter simulation (GPP: % to %, ET: % to %). Although errors were also consistently reduced with real data, comparing the emulator designs was less conclusive, which we mainly attribute to equifinality, structural uncertainty within CLM-FATES, and/or unknown errors in the data that are not accounted for.
期刊介绍:
The Journal of Advances in Modeling Earth Systems (JAMES) is committed to advancing the science of Earth systems modeling by offering high-quality scientific research through online availability and open access licensing. JAMES invites authors and readers from the international Earth systems modeling community.
Open access. Articles are available free of charge for everyone with Internet access to view and download.
Formal peer review.
Supplemental material, such as code samples, images, and visualizations, is published at no additional charge.
No additional charge for color figures.
Modest page charges to cover production costs.
Articles published in high-quality full text PDF, HTML, and XML.
Internal and external reference linking, DOI registration, and forward linking via CrossRef.