Uncertainty estimation of machine learning spatial precipitation predictions from satellite data

Machine Learning: Science and Technology Pub Date : 2024-07-16 DOI:10.1088/2632-2153/ad63f3

Georgia Papacharalampous, Hristos Tyralis, N. Doulamis, Anastasios Doulamis

{"title":"Uncertainty estimation of machine learning spatial precipitation predictions from satellite data","authors":"Georgia Papacharalampous, Hristos Tyralis, N. Doulamis, Anastasios Doulamis","doi":"10.1088/2632-2153/ad63f3","DOIUrl":null,"url":null,"abstract":"\n Merging satellite and gauge data with machine learning produces high-resolution precipitation datasets, but uncertainty estimates are often missing. We addressed the gap of how to optimally provide such estimates by benchmarking six algorithms, mostly novel even for the more general task of quantifying predictive uncertainty in spatial prediction settings. On 15 years of monthly data from over the contiguous United States (CONUS), we compared quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). Their ability to issue predictive precipitation quantiles at nine quantile levels (0.025, 0.050, 0.100, 0.250, 0.500, 0.750, 0.900, 0.950, 0.975), approximating the full probability distribution, was evaluated using quantile scoring functions and the quantile scoring rule. Predictors at a site were nearby values from two satellite precipitation retrievals, namely PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals), and the site’s elevation. The dependent variable was the monthly mean gauge precipitation. With respect to QR, LightGBM showed improved performance in terms of the quantile scoring rule by 11.10%, also surpassing QRF (7.96%), GRF (7.44%), GBM (4.64%) and QRNN (1.73%). Notably, LightGBM outperformed all random forest variants, the current standard in spatial prediction with machine learning. To conclude, we propose a suite of machine learning algorithms for estimating uncertainty in spatial data prediction, supported with a formal evaluation framework based on scoring functions and scoring rules.","PeriodicalId":503691,"journal":{"name":"Machine Learning: Science and Technology","volume":"7 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning: Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad63f3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Merging satellite and gauge data with machine learning produces high-resolution precipitation datasets, but uncertainty estimates are often missing. We addressed the gap of how to optimally provide such estimates by benchmarking six algorithms, mostly novel even for the more general task of quantifying predictive uncertainty in spatial prediction settings. On 15 years of monthly data from over the contiguous United States (CONUS), we compared quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). Their ability to issue predictive precipitation quantiles at nine quantile levels (0.025, 0.050, 0.100, 0.250, 0.500, 0.750, 0.900, 0.950, 0.975), approximating the full probability distribution, was evaluated using quantile scoring functions and the quantile scoring rule. Predictors at a site were nearby values from two satellite precipitation retrievals, namely PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals), and the site’s elevation. The dependent variable was the monthly mean gauge precipitation. With respect to QR, LightGBM showed improved performance in terms of the quantile scoring rule by 11.10%, also surpassing QRF (7.96%), GRF (7.44%), GBM (4.64%) and QRNN (1.73%). Notably, LightGBM outperformed all random forest variants, the current standard in spatial prediction with machine learning. To conclude, we propose a suite of machine learning algorithms for estimating uncertainty in spatial data prediction, supported with a formal evaluation framework based on scoring functions and scoring rules.

查看原文本刊更多论文

机器学习卫星数据空间降水预测的不确定性估计

通过机器学习合并卫星和测站数据可生成高分辨率降水数据集，但往往缺少不确定性估计。我们通过对六种算法进行基准测试，填补了如何以最佳方式提供此类估计值的空白，这些算法大多是新颖的，甚至适用于在空间预测环境中量化预测不确定性这一更为普遍的任务。在美国毗连地区（CONUS）15 年的月度数据上，我们比较了量化回归（QR）、量化回归森林（QRF）、广义随机森林（GRF）、梯度提升机（GBM）、轻梯度提升机（LightGBM）和量化回归神经网络（QRNN）。利用量子评分函数和量子评分规则，评估了它们在九个量子级别（0.025、0.050、0.100、0.250、0.500、0.750、0.900、0.950、0.975）（近似全概率分布）发布预测降水量子值的能力。一个站点的预测因子是两个卫星降水检索的附近值，即 PERSIANN（利用人工神经网络从遥感信息中估计降水量）和 IMERG（综合多卫星降水检索），以及该站点的海拔高度。因变量为月平均测站降水量。与 QR 相比，LightGBM 在量化评分规则方面的性能提高了 11.10%，也超过了 QRF（7.96%）、GRF（7.44%）、GBM（4.64%）和 QRNN（1.73%）。值得注意的是，LightGBM 的表现优于所有随机森林变体，后者是目前机器学习空间预测的标准。总之，我们提出了一套用于估计空间数据预测不确定性的机器学习算法，并辅以基于评分函数和评分规则的正式评估框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Learning: Science and Technology

自引率

0.00%

发文量