Data mining-based machine learning methods for improving hydrological data a case study of salinity field in the Western Arctic Ocean

IF 11.2 1区 地球科学 Q1 GEOSCIENCES, MULTIDISCIPLINARY
Shuhao Tao, Ling Du, Jiahao Li
{"title":"Data mining-based machine learning methods for improving hydrological data a case study of salinity field in the Western Arctic Ocean","authors":"Shuhao Tao, Ling Du, Jiahao Li","doi":"10.5194/essd-2024-138","DOIUrl":null,"url":null,"abstract":"<strong>Abstract.</strong> In the Western Arctic Ocean lies the largest freshwater reservoir in the Arctic Ocean, the Beaufort Gyre. Long-term changes in freshwater reservoirs are critical for understanding the Arctic Ocean, and data from various sources, particularly measured or reanalyzed data, must be used to the greatest extent possible. Over the past two decades, a large number of intensive field observations and ship surveys have been conducted in the western Arctic Ocean to obtain a large amount of CTD data. Multiple machine learning methods were evaluated and merged to reconstruct annual salinity product in the western Arctic Ocean over the period 2003–2022. Data mining-based machine learning methods make use of variables determined by physical processes, such as sea level pressure, sea ice concentration, and drift. Our objective is to effectively manage the mean root mean square error (RMSE) of sea surface salinity, which exhibits greater susceptibility to atmospheric, sea ice, and oceanic changes. Considering the higher susceptibility of sea surface salinity to atmospheric, sea ice, and oceanic changes, which leads to greater variability, we ensured that the average root mean square error of CTD and EN4 sea surface salinity field during the machine learning training process was constrained within 0.25 psu. The machine learning process reveals that the uncertainty in predicting sea surface salinity, as constrained by CTD data, is 0.24 %, whereas when constrained by EN4 data it reduces to 0.02 %. During data merging and post-calibrating, the weight coefficients are constrained by imposing limitations on the uncertainty value. Compared with commonly used EN4 and ORAS5 salinity in the Arctic Ocean, our salinity product provide more accurate descriptions of freshwater content in the Beaufort Gyre and depth variations at its halocline base. The application potential of this multi-machine learning results approach for evaluating and integrating extends beyond the salinity field, encompassing hydrometeorology, sea ice thickness, polar biogeochemistry, and other related fields. The datasets are available at https://zenodo.org/records/10990138 (Tao and Du, 2024).","PeriodicalId":48747,"journal":{"name":"Earth System Science Data","volume":"84 1","pages":""},"PeriodicalIF":11.2000,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth System Science Data","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/essd-2024-138","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract. In the Western Arctic Ocean lies the largest freshwater reservoir in the Arctic Ocean, the Beaufort Gyre. Long-term changes in freshwater reservoirs are critical for understanding the Arctic Ocean, and data from various sources, particularly measured or reanalyzed data, must be used to the greatest extent possible. Over the past two decades, a large number of intensive field observations and ship surveys have been conducted in the western Arctic Ocean to obtain a large amount of CTD data. Multiple machine learning methods were evaluated and merged to reconstruct annual salinity product in the western Arctic Ocean over the period 2003–2022. Data mining-based machine learning methods make use of variables determined by physical processes, such as sea level pressure, sea ice concentration, and drift. Our objective is to effectively manage the mean root mean square error (RMSE) of sea surface salinity, which exhibits greater susceptibility to atmospheric, sea ice, and oceanic changes. Considering the higher susceptibility of sea surface salinity to atmospheric, sea ice, and oceanic changes, which leads to greater variability, we ensured that the average root mean square error of CTD and EN4 sea surface salinity field during the machine learning training process was constrained within 0.25 psu. The machine learning process reveals that the uncertainty in predicting sea surface salinity, as constrained by CTD data, is 0.24 %, whereas when constrained by EN4 data it reduces to 0.02 %. During data merging and post-calibrating, the weight coefficients are constrained by imposing limitations on the uncertainty value. Compared with commonly used EN4 and ORAS5 salinity in the Arctic Ocean, our salinity product provide more accurate descriptions of freshwater content in the Beaufort Gyre and depth variations at its halocline base. The application potential of this multi-machine learning results approach for evaluating and integrating extends beyond the salinity field, encompassing hydrometeorology, sea ice thickness, polar biogeochemistry, and other related fields. The datasets are available at https://zenodo.org/records/10990138 (Tao and Du, 2024).
基于数据挖掘的机器学习方法用于改进水文数据 北冰洋西部盐度场案例研究
摘要北冰洋西部有北冰洋最大的淡水库--波弗特环流。淡水库的长期变化对了解北冰洋至关重要,必须最大限度地利用各种来源的数据,特别是测量或重新分析的数据。在过去二十年中,在北冰洋西部进行了大量密集的实地观测和船舶调查,获得了大量 CTD 数据。对多种机器学习方法进行了评估和合并,以重建 2003-2022 年期间北冰洋西部的年盐度乘积。基于数据挖掘的机器学习方法利用了由物理过程决定的变量,如海平面压力、海冰浓度和漂移。我们的目标是有效管理海面盐度的均方根误差(RMSE),因为海面盐度更容易受到大气、海冰和海洋变化的影响。考虑到海表盐度更容易受到大气、海冰和海洋变化的影响,从而导致更大的变异性,我们确保在机器学习训练过程中将 CTD 和 EN4 海表盐度场的平均均方根误差控制在 0.25 psu 以内。机器学习过程显示,在 CTD 数据的约束下,预测海面盐度的不确定性为 0.24%,而在 EN4 数据的约束下,不确定性降低到 0.02%。在数据合并和后校准过程中,通过对不确定性值施加限制来约束加权系数。与北冰洋常用的 EN4 和 ORAS5 盐度相比,我们的盐度产品能更准确地描述波弗特环流的淡水含量及其卤线基底的深度变化。这种多机器学习结果评估和整合方法的应用潜力超出了盐度领域,涵盖了水文气象、海冰厚度、极地生物地球化学和其他相关领域。数据集可在 https://zenodo.org/records/10990138 网站上查阅(陶和杜,2024 年)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Earth System Science Data
Earth System Science Data GEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
18.00
自引率
5.30%
发文量
231
审稿时长
35 weeks
期刊介绍: Earth System Science Data (ESSD) is an international, interdisciplinary journal that publishes articles on original research data in order to promote the reuse of high-quality data in the field of Earth system sciences. The journal welcomes submissions of original data or data collections that meet the required quality standards and have the potential to contribute to the goals of the journal. It includes sections dedicated to regular-length articles, brief communications (such as updates to existing data sets), commentaries, review articles, and special issues. ESSD is abstracted and indexed in several databases, including Science Citation Index Expanded, Current Contents/PCE, Scopus, ADS, CLOCKSS, CNKI, DOAJ, EBSCO, Gale/Cengage, GoOA (CAS), and Google Scholar, among others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信