Uncertainty estimation of machine learning spatial precipitation predictions from satellite data

Georgia Papacharalampous, Hristos Tyralis, N. Doulamis, Anastasios Doulamis
{"title":"Uncertainty estimation of machine learning spatial precipitation predictions from satellite data","authors":"Georgia Papacharalampous, Hristos Tyralis, N. Doulamis, Anastasios Doulamis","doi":"10.1088/2632-2153/ad63f3","DOIUrl":null,"url":null,"abstract":"\n Merging satellite and gauge data with machine learning produces high-resolution precipitation datasets, but uncertainty estimates are often missing. We addressed the gap of how to optimally provide such estimates by benchmarking six algorithms, mostly novel even for the more general task of quantifying predictive uncertainty in spatial prediction settings. On 15 years of monthly data from over the contiguous United States (CONUS), we compared quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). Their ability to issue predictive precipitation quantiles at nine quantile levels (0.025, 0.050, 0.100, 0.250, 0.500, 0.750, 0.900, 0.950, 0.975), approximating the full probability distribution, was evaluated using quantile scoring functions and the quantile scoring rule. Predictors at a site were nearby values from two satellite precipitation retrievals, namely PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals), and the site’s elevation. The dependent variable was the monthly mean gauge precipitation. With respect to QR, LightGBM showed improved performance in terms of the quantile scoring rule by 11.10%, also surpassing QRF (7.96%), GRF (7.44%), GBM (4.64%) and QRNN (1.73%). Notably, LightGBM outperformed all random forest variants, the current standard in spatial prediction with machine learning. To conclude, we propose a suite of machine learning algorithms for estimating uncertainty in spatial data prediction, supported with a formal evaluation framework based on scoring functions and scoring rules.","PeriodicalId":503691,"journal":{"name":"Machine Learning: Science and Technology","volume":"7 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning: Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad63f3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Merging satellite and gauge data with machine learning produces high-resolution precipitation datasets, but uncertainty estimates are often missing. We addressed the gap of how to optimally provide such estimates by benchmarking six algorithms, mostly novel even for the more general task of quantifying predictive uncertainty in spatial prediction settings. On 15 years of monthly data from over the contiguous United States (CONUS), we compared quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). Their ability to issue predictive precipitation quantiles at nine quantile levels (0.025, 0.050, 0.100, 0.250, 0.500, 0.750, 0.900, 0.950, 0.975), approximating the full probability distribution, was evaluated using quantile scoring functions and the quantile scoring rule. Predictors at a site were nearby values from two satellite precipitation retrievals, namely PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals), and the site’s elevation. The dependent variable was the monthly mean gauge precipitation. With respect to QR, LightGBM showed improved performance in terms of the quantile scoring rule by 11.10%, also surpassing QRF (7.96%), GRF (7.44%), GBM (4.64%) and QRNN (1.73%). Notably, LightGBM outperformed all random forest variants, the current standard in spatial prediction with machine learning. To conclude, we propose a suite of machine learning algorithms for estimating uncertainty in spatial data prediction, supported with a formal evaluation framework based on scoring functions and scoring rules.
机器学习卫星数据空间降水预测的不确定性估计
通过机器学习合并卫星和测站数据可生成高分辨率降水数据集,但往往缺少不确定性估计。我们通过对六种算法进行基准测试,填补了如何以最佳方式提供此类估计值的空白,这些算法大多是新颖的,甚至适用于在空间预测环境中量化预测不确定性这一更为普遍的任务。在美国毗连地区(CONUS)15 年的月度数据上,我们比较了量化回归(QR)、量化回归森林(QRF)、广义随机森林(GRF)、梯度提升机(GBM)、轻梯度提升机(LightGBM)和量化回归神经网络(QRNN)。利用量子评分函数和量子评分规则,评估了它们在九个量子级别(0.025、0.050、0.100、0.250、0.500、0.750、0.900、0.950、0.975)(近似全概率分布)发布预测降水量子值的能力。一个站点的预测因子是两个卫星降水检索的附近值,即 PERSIANN(利用人工神经网络从遥感信息中估计降水量)和 IMERG(综合多卫星降水检索),以及该站点的海拔高度。因变量为月平均测站降水量。与 QR 相比,LightGBM 在量化评分规则方面的性能提高了 11.10%,也超过了 QRF(7.96%)、GRF(7.44%)、GBM(4.64%)和 QRNN(1.73%)。值得注意的是,LightGBM 的表现优于所有随机森林变体,后者是目前机器学习空间预测的标准。总之,我们提出了一套用于估计空间数据预测不确定性的机器学习算法,并辅以基于评分函数和评分规则的正式评估框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信