A data-driven supervised machine learning approach to estimating global ambient air pollution concentrations with associated prediction intervals.

IF 2.9 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Royal Society Open Science Pub Date : 2025-07-23 eCollection Date: 2025-07-01 DOI:10.1098/rsos.241288
Liam Jordan Berrisford, Hugo Barbosa, Ronaldo Menezes
{"title":"A data-driven supervised machine learning approach to estimating global ambient air pollution concentrations with associated prediction intervals.","authors":"Liam Jordan Berrisford, Hugo Barbosa, Ronaldo Menezes","doi":"10.1098/rsos.241288","DOIUrl":null,"url":null,"abstract":"<p><p>Global ambient air pollution, a transboundary challenge, is typically addressed through interventions relying on data from spatially sparse and heterogeneously placed monitoring stations. These stations often encounter temporal data gaps due to issues such as power outages. In response, we have developed a scalable, data-driven, supervised machine learning framework. The models produced by the framework are designed to impute missing temporal and spatial measurements, thereby generating a comprehensive dataset for air pollutants including NO<sub>2</sub>, O<sub>3</sub>, PM<sub>10</sub>, PM<sub>2</sub>.<sub>5</sub> and SO<sub>2</sub>. In this work, we produce models providing concentration estimations at 261 377 locations across the globe. The dataset, with a fine granularity of 0.25° spatial resolution at hourly time intervals and accompanied by prediction intervals for each estimate, caters to a wide range of stakeholders relying on outdoor air pollution data for downstream assessments. This enables more detailed studies. Additionally, the model's performance across various geographical locations is examined, providing insights and recommendations for strategic placement of future monitoring stations to further enhance the model's accuracy.</p>","PeriodicalId":21525,"journal":{"name":"Royal Society Open Science","volume":"12 7","pages":"241288"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289206/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Royal Society Open Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsos.241288","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Global ambient air pollution, a transboundary challenge, is typically addressed through interventions relying on data from spatially sparse and heterogeneously placed monitoring stations. These stations often encounter temporal data gaps due to issues such as power outages. In response, we have developed a scalable, data-driven, supervised machine learning framework. The models produced by the framework are designed to impute missing temporal and spatial measurements, thereby generating a comprehensive dataset for air pollutants including NO2, O3, PM10, PM2.5 and SO2. In this work, we produce models providing concentration estimations at 261 377 locations across the globe. The dataset, with a fine granularity of 0.25° spatial resolution at hourly time intervals and accompanied by prediction intervals for each estimate, caters to a wide range of stakeholders relying on outdoor air pollution data for downstream assessments. This enables more detailed studies. Additionally, the model's performance across various geographical locations is examined, providing insights and recommendations for strategic placement of future monitoring stations to further enhance the model's accuracy.

一种数据驱动的监督式机器学习方法,用于估算具有相关预测区间的全球环境空气污染浓度。
全球环境空气污染是一项跨界挑战,通常通过依赖空间稀疏和分布不均的监测站数据的干预措施来解决。由于停电等问题,这些站点经常会遇到暂时的数据缺口。作为回应,我们开发了一个可扩展的、数据驱动的、有监督的机器学习框架。该框架生成的模型旨在计算缺失的时间和空间测量,从而生成包括NO2、O3、PM10、PM2.5和SO2在内的空气污染物的综合数据集。在这项工作中,我们建立了模型,提供了全球26377个地点的浓度估计。该数据集以每小时为间隔,具有0.25°空间分辨率的细粒度,并附有每次估计的预测间隔,满足依赖户外空气污染数据进行下游评估的广泛利益相关者。这使得更详细的研究成为可能。此外,还研究了模型在不同地理位置的性能,为未来监测站的战略布局提供见解和建议,以进一步提高模型的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Royal Society Open Science
Royal Society Open Science Multidisciplinary-Multidisciplinary
CiteScore
6.00
自引率
0.00%
发文量
508
审稿时长
14 weeks
期刊介绍: Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review. The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信