结合数据同化和机器学习的复杂地表模型参数推断

IF 4.4 2区 地球科学 Q1 METEOROLOGY & ATMOSPHERIC SCIENCES
L. T. Keetz, K. Aalstad, R. A. Fisher, C. Poppe Terán, B. Naz, N. Pirk, Y. A. Yilmaz, O. Skarpaas
{"title":"结合数据同化和机器学习的复杂地表模型参数推断","authors":"L. T. Keetz,&nbsp;K. Aalstad,&nbsp;R. A. Fisher,&nbsp;C. Poppe Terán,&nbsp;B. Naz,&nbsp;N. Pirk,&nbsp;Y. A. Yilmaz,&nbsp;O. Skarpaas","doi":"10.1029/2024MS004542","DOIUrl":null,"url":null,"abstract":"<p>Complex Land Surface Models (LSMs) rely on a plethora of parameters. These parameters and the associated process formulations are often poorly constrained, which hampers reliable predictions of ecosystem dynamics and climate feedbacks. Robust and uncertainty-aware parameter estimation with observations is complicated by, for example, the high dimensionality of the model parameter space and the computational cost of LSM simulations. Herein, we adapt a novel Bayesian data assimilation (DA) and machine learning framework termed “calibrate, emulate, sample” (CES) to infer parameters in a widely-used LSM coupled with a demographic vegetation model (CLM-FATES). First, an iterative ensemble Kalman smoother provides an initial estimate of the posterior distribution (“calibrate”). Subsequently, a machine-learning-based emulator is trained on the resulting model-observation mismatches to predict outcomes for unseen parameter combinations (“emulate”). Finally, this emulator replaces CLM-FATES simulations in an adaptive Markov Chain Monte Carlo approach enabling computationally feasible posterior sampling with enhanced uncertainty quantification (“sample”). We test our implementation with synthetic and real observations representing a boreal forest site in southern Finland. We estimate a total of six plant-functional-type-specific photosynthetic parameters by assimilating evapotranspiration (ET) and gross primary production (GPP) flux data. CES provided the best estimates of the synthetic truth parameters when compared to data-blind emulator sampling designs while all approaches reduced model-observation errors compared to a default parameter simulation (GPP: <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>10</mn>\n </mrow>\n <annotation> ${-}10$</annotation>\n </semantics></math>% to <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>30</mn>\n </mrow>\n <annotation> ${-}30$</annotation>\n </semantics></math>%, ET: <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>4</mn>\n </mrow>\n <annotation> ${-}4$</annotation>\n </semantics></math>% to <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>6</mn>\n </mrow>\n <annotation> ${-}6$</annotation>\n </semantics></math>%). Although errors were also consistently reduced with real data, comparing the emulator designs was less conclusive, which we mainly attribute to equifinality, structural uncertainty within CLM-FATES, and/or unknown errors in the data that are not accounted for.</p>","PeriodicalId":14881,"journal":{"name":"Journal of Advances in Modeling Earth Systems","volume":"17 6","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1029/2024MS004542","citationCount":"0","resultStr":"{\"title\":\"Inferring Parameters in a Complex Land Surface Model by Combining Data Assimilation and Machine Learning\",\"authors\":\"L. T. Keetz,&nbsp;K. Aalstad,&nbsp;R. A. Fisher,&nbsp;C. Poppe Terán,&nbsp;B. Naz,&nbsp;N. Pirk,&nbsp;Y. A. Yilmaz,&nbsp;O. Skarpaas\",\"doi\":\"10.1029/2024MS004542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Complex Land Surface Models (LSMs) rely on a plethora of parameters. These parameters and the associated process formulations are often poorly constrained, which hampers reliable predictions of ecosystem dynamics and climate feedbacks. Robust and uncertainty-aware parameter estimation with observations is complicated by, for example, the high dimensionality of the model parameter space and the computational cost of LSM simulations. Herein, we adapt a novel Bayesian data assimilation (DA) and machine learning framework termed “calibrate, emulate, sample” (CES) to infer parameters in a widely-used LSM coupled with a demographic vegetation model (CLM-FATES). First, an iterative ensemble Kalman smoother provides an initial estimate of the posterior distribution (“calibrate”). Subsequently, a machine-learning-based emulator is trained on the resulting model-observation mismatches to predict outcomes for unseen parameter combinations (“emulate”). Finally, this emulator replaces CLM-FATES simulations in an adaptive Markov Chain Monte Carlo approach enabling computationally feasible posterior sampling with enhanced uncertainty quantification (“sample”). We test our implementation with synthetic and real observations representing a boreal forest site in southern Finland. We estimate a total of six plant-functional-type-specific photosynthetic parameters by assimilating evapotranspiration (ET) and gross primary production (GPP) flux data. CES provided the best estimates of the synthetic truth parameters when compared to data-blind emulator sampling designs while all approaches reduced model-observation errors compared to a default parameter simulation (GPP: <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>−</mo>\\n <mn>10</mn>\\n </mrow>\\n <annotation> ${-}10$</annotation>\\n </semantics></math>% to <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>−</mo>\\n <mn>30</mn>\\n </mrow>\\n <annotation> ${-}30$</annotation>\\n </semantics></math>%, ET: <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>−</mo>\\n <mn>4</mn>\\n </mrow>\\n <annotation> ${-}4$</annotation>\\n </semantics></math>% to <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>−</mo>\\n <mn>6</mn>\\n </mrow>\\n <annotation> ${-}6$</annotation>\\n </semantics></math>%). Although errors were also consistently reduced with real data, comparing the emulator designs was less conclusive, which we mainly attribute to equifinality, structural uncertainty within CLM-FATES, and/or unknown errors in the data that are not accounted for.</p>\",\"PeriodicalId\":14881,\"journal\":{\"name\":\"Journal of Advances in Modeling Earth Systems\",\"volume\":\"17 6\",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1029/2024MS004542\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Advances in Modeling Earth Systems\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1029/2024MS004542\",\"RegionNum\":2,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"METEOROLOGY & ATMOSPHERIC SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Modeling Earth Systems","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1029/2024MS004542","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

复杂地表模型(lsm)依赖于大量的参数。这些参数和相关的过程公式往往缺乏约束,这妨碍了对生态系统动力学和气候反馈的可靠预测。由于模型参数空间的高维性和LSM模拟的计算成本等原因,利用观测数据进行鲁棒和不确定性感知参数估计变得复杂。在此,我们采用了一种新的贝叶斯数据同化(DA)和机器学习框架,称为“校准,模拟,样本”(CES),以推断广泛使用的LSM与人口统计学植被模型(CLM-FATES)中的参数。首先,迭代集合卡尔曼平滑提供了后验分布的初始估计(“校准”)。随后,基于机器学习的仿真器在结果模型观察不匹配的情况下进行训练,以预测未知参数组合的结果(“仿真”)。最后,该模拟器在自适应马尔可夫链蒙特卡罗方法中取代了CLM-FATES模拟,使计算可行的后验采样具有增强的不确定性量化(“样本”)。我们通过芬兰南部一个北方森林站点的合成和真实观测来测试我们的实现。通过吸收蒸散发(ET)和总初级生产量(GPP)通量数据,我们估计了6个植物功能类型特定的光合参数。与数据盲仿真器抽样设计相比,CES提供了合成真值参数的最佳估计,而与默认参数仿真(GPP)相比,所有方法都减少了模型观测误差。−10${-}10$ % ~−30$ {-}30$ %,ET:−4${-}4$ % ~−6${-}6$ %)。虽然误差也与真实数据一致地减少,但比较模拟器设计不太确定,我们主要归因于CLM-FATES中的均衡性、结构不确定性和/或未考虑的数据中的未知误差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Inferring Parameters in a Complex Land Surface Model by Combining Data Assimilation and Machine Learning

Complex Land Surface Models (LSMs) rely on a plethora of parameters. These parameters and the associated process formulations are often poorly constrained, which hampers reliable predictions of ecosystem dynamics and climate feedbacks. Robust and uncertainty-aware parameter estimation with observations is complicated by, for example, the high dimensionality of the model parameter space and the computational cost of LSM simulations. Herein, we adapt a novel Bayesian data assimilation (DA) and machine learning framework termed “calibrate, emulate, sample” (CES) to infer parameters in a widely-used LSM coupled with a demographic vegetation model (CLM-FATES). First, an iterative ensemble Kalman smoother provides an initial estimate of the posterior distribution (“calibrate”). Subsequently, a machine-learning-based emulator is trained on the resulting model-observation mismatches to predict outcomes for unseen parameter combinations (“emulate”). Finally, this emulator replaces CLM-FATES simulations in an adaptive Markov Chain Monte Carlo approach enabling computationally feasible posterior sampling with enhanced uncertainty quantification (“sample”). We test our implementation with synthetic and real observations representing a boreal forest site in southern Finland. We estimate a total of six plant-functional-type-specific photosynthetic parameters by assimilating evapotranspiration (ET) and gross primary production (GPP) flux data. CES provided the best estimates of the synthetic truth parameters when compared to data-blind emulator sampling designs while all approaches reduced model-observation errors compared to a default parameter simulation (GPP: 10 ${-}10$ % to 30 ${-}30$ %, ET: 4 ${-}4$ % to 6 ${-}6$ %). Although errors were also consistently reduced with real data, comparing the emulator designs was less conclusive, which we mainly attribute to equifinality, structural uncertainty within CLM-FATES, and/or unknown errors in the data that are not accounted for.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Advances in Modeling Earth Systems
Journal of Advances in Modeling Earth Systems METEOROLOGY & ATMOSPHERIC SCIENCES-
CiteScore
11.40
自引率
11.80%
发文量
241
审稿时长
>12 weeks
期刊介绍: The Journal of Advances in Modeling Earth Systems (JAMES) is committed to advancing the science of Earth systems modeling by offering high-quality scientific research through online availability and open access licensing. JAMES invites authors and readers from the international Earth systems modeling community. Open access. Articles are available free of charge for everyone with Internet access to view and download. Formal peer review. Supplemental material, such as code samples, images, and visualizations, is published at no additional charge. No additional charge for color figures. Modest page charges to cover production costs. Articles published in high-quality full text PDF, HTML, and XML. Internal and external reference linking, DOI registration, and forward linking via CrossRef.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信