Fehmi Özbayrak , John T. Foster , Michael J. Pyrcz
{"title":"Spatial bagging for predictive machine learning uncertainty quantification","authors":"Fehmi Özbayrak , John T. Foster , Michael J. Pyrcz","doi":"10.1016/j.cageo.2025.105947","DOIUrl":null,"url":null,"abstract":"<div><div>Uncertainty quantification is a critical component in the interpretation of spatial phenomena, particularly within the geosciences, where incomplete subsurface data leads to various possible scenarios, making it crucial for risk assessment and decision-making. Traditional geostatistical methods have served as the cornerstone for uncertainty analysis; however, the incorporation of machine learning, particularly ensemble methods, offers a compelling augmentation, especially in handling complex and noisy datasets. Building on our previous work, which introduced a spatial bagging technique for enhancing prediction accuracy, this study extends the method to uncertainty quantification by applying a widely-used UQ metric from geostatistics.</div><div>Our approach employs a bootstrap method adjusted for effective sample size derived from spatial statistics, addressing the common issue of overfitting when dealing with dependent data. We demonstrate, through a series of synthetic datasets with varied noise levels and spatial structures, that our spatial bagging method not only outperforms standard bagging techniques in prediction accuracy but also provides superior uncertainty quantification. The robustness of the method against noise and its computational efficiency, particularly in spatially correlated data, positions it as a promising tool for geoscientists and others who require reliable uncertainty measures in spatial analysis.</div></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"203 ","pages":"Article 105947"},"PeriodicalIF":4.4000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300425000974","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Uncertainty quantification is a critical component in the interpretation of spatial phenomena, particularly within the geosciences, where incomplete subsurface data leads to various possible scenarios, making it crucial for risk assessment and decision-making. Traditional geostatistical methods have served as the cornerstone for uncertainty analysis; however, the incorporation of machine learning, particularly ensemble methods, offers a compelling augmentation, especially in handling complex and noisy datasets. Building on our previous work, which introduced a spatial bagging technique for enhancing prediction accuracy, this study extends the method to uncertainty quantification by applying a widely-used UQ metric from geostatistics.
Our approach employs a bootstrap method adjusted for effective sample size derived from spatial statistics, addressing the common issue of overfitting when dealing with dependent data. We demonstrate, through a series of synthetic datasets with varied noise levels and spatial structures, that our spatial bagging method not only outperforms standard bagging techniques in prediction accuracy but also provides superior uncertainty quantification. The robustness of the method against noise and its computational efficiency, particularly in spatially correlated data, positions it as a promising tool for geoscientists and others who require reliable uncertainty measures in spatial analysis.
期刊介绍:
Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.