Navigating the Noise: Bringing Clarity to ML Parameterization Design With O $\boldsymbol{\mathcal{O}}$ (100) Ensembles

IF 4.4 2区 地球科学 Q1 METEOROLOGY & ATMOSPHERIC SCIENCES
Jerry Lin, Sungduk Yu, Liran Peng, Tom Beucler, Eliot Wong-Toi, Zeyuan Hu, Pierre Gentine, Margarita Geleta, Mike Pritchard
{"title":"Navigating the Noise: Bringing Clarity to ML Parameterization Design With \n \n \n O\n \n $\\boldsymbol{\\mathcal{O}}$\n (100) Ensembles","authors":"Jerry Lin,&nbsp;Sungduk Yu,&nbsp;Liran Peng,&nbsp;Tom Beucler,&nbsp;Eliot Wong-Toi,&nbsp;Zeyuan Hu,&nbsp;Pierre Gentine,&nbsp;Margarita Geleta,&nbsp;Mike Pritchard","doi":"10.1029/2024MS004551","DOIUrl":null,"url":null,"abstract":"<p>Machine-learning (ML) parameterizations of subgrid processes (here of turbulence, convection, and radiation) may one day replace conventional parameterizations by emulating high-resolution physics without the cost of explicit simulation. However, uncertainty about the relationship between offline and online performance (i.e., when integrated with a large-scale general circulation model) hinders their development. Much of this uncertainty stems from limited sampling of the noisy, emergent effects of upstream ML design decisions on downstream online hybrid simulation. Our work rectifies the sampling issue via the construction of a semi-automated, end-to-end pipeline for <span></span><math>\n <semantics>\n <mrow>\n <mi>O</mi>\n <mrow>\n <mo>(</mo>\n <mn>100</mn>\n <mo>)</mo>\n </mrow>\n </mrow>\n <annotation> $\\mathcal{O}(100)$</annotation>\n </semantics></math> size ensembles of hybrid simulations, revealing important nuances in how systematic reductions in offline error manifest in changes to online error and online stability. For example, removing dropout and switching from a Mean Squared Error to a Mean Absolute Error loss both reduce offline error, but they have opposite effects on online error and online stability. Other design decisions, like incorporating memory, converting moisture input from specific humidity to relative humidity, using batch normalization, and training on multiple climates do not come with any such compromises. Finally, we show that ensemble sizes of <span></span><math>\n <semantics>\n <mrow>\n <mi>O</mi>\n <mrow>\n <mo>(</mo>\n <mn>100</mn>\n <mo>)</mo>\n </mrow>\n </mrow>\n <annotation> $\\mathcal{O}(100)$</annotation>\n </semantics></math> may be necessary to reliably detect causally relevant differences online. By enabling rapid online experimentation at scale, we can empirically settle debates regarding subgrid ML parameterization design that would have otherwise remained unresolved in the noise.</p>","PeriodicalId":14881,"journal":{"name":"Journal of Advances in Modeling Earth Systems","volume":"17 4","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1029/2024MS004551","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Modeling Earth Systems","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1029/2024MS004551","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"METEOROLOGY & ATMOSPHERIC SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Machine-learning (ML) parameterizations of subgrid processes (here of turbulence, convection, and radiation) may one day replace conventional parameterizations by emulating high-resolution physics without the cost of explicit simulation. However, uncertainty about the relationship between offline and online performance (i.e., when integrated with a large-scale general circulation model) hinders their development. Much of this uncertainty stems from limited sampling of the noisy, emergent effects of upstream ML design decisions on downstream online hybrid simulation. Our work rectifies the sampling issue via the construction of a semi-automated, end-to-end pipeline for O ( 100 ) $\mathcal{O}(100)$ size ensembles of hybrid simulations, revealing important nuances in how systematic reductions in offline error manifest in changes to online error and online stability. For example, removing dropout and switching from a Mean Squared Error to a Mean Absolute Error loss both reduce offline error, but they have opposite effects on online error and online stability. Other design decisions, like incorporating memory, converting moisture input from specific humidity to relative humidity, using batch normalization, and training on multiple climates do not come with any such compromises. Finally, we show that ensemble sizes of O ( 100 ) $\mathcal{O}(100)$ may be necessary to reliably detect causally relevant differences online. By enabling rapid online experimentation at scale, we can empirically settle debates regarding subgrid ML parameterization design that would have otherwise remained unresolved in the noise.

Abstract Image

驾驭噪音:用 O $\boldsymbol{\mathcal{O}}$ (100) 个集合使 ML 参数化设计更加清晰
子网格过程的机器学习(ML)参数化(这里是湍流、对流和辐射)可能有一天会通过模拟高分辨率物理来取代传统的参数化,而无需显式模拟的成本。然而,线下和线上业绩之间关系的不确定性(即与大规模流通模式相结合时)阻碍了它们的发展。这种不确定性很大程度上源于有限的噪声采样,上游ML设计决策对下游在线混合模拟的紧急影响。我们的工作通过为O (100)$ \mathcal{O}(100)$大小的混合模拟集成构建一个半自动化的端到端管道来纠正采样问题,揭示了离线错误的系统减少如何在在线错误和在线稳定性的变化中表现出来的重要细微差别。例如,去除dropout和从均方误差(Mean Squared Error)损失切换到平均绝对误差(Mean Absolute Error)损失都可以减少离线误差,但它们对在线误差和在线稳定性有相反的影响。其他设计决策,如整合内存、将湿度输入从特定湿度转换为相对湿度、使用批处理归一化以及在多种气候条件下进行训练,都不会带来任何这样的妥协。最后,我们证明了O (100)$ \mathcal{O}(100)$的集合大小可能是可靠地在线检测因果相关差异所必需的。通过实现大规模的快速在线实验,我们可以根据经验解决有关子网格ML参数化设计的争论,否则这些争论将在噪声中无法解决。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Advances in Modeling Earth Systems
Journal of Advances in Modeling Earth Systems METEOROLOGY & ATMOSPHERIC SCIENCES-
CiteScore
11.40
自引率
11.80%
发文量
241
审稿时长
>12 weeks
期刊介绍: The Journal of Advances in Modeling Earth Systems (JAMES) is committed to advancing the science of Earth systems modeling by offering high-quality scientific research through online availability and open access licensing. JAMES invites authors and readers from the international Earth systems modeling community. Open access. Articles are available free of charge for everyone with Internet access to view and download. Formal peer review. Supplemental material, such as code samples, images, and visualizations, is published at no additional charge. No additional charge for color figures. Modest page charges to cover production costs. Articles published in high-quality full text PDF, HTML, and XML. Internal and external reference linking, DOI registration, and forward linking via CrossRef.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信