Nationwide Machine Learning-Ensemble PM2.5 Mapping Prediction and Forecasting Models in South Korea with High Spatiotemporal Resolution and Health Risk Estimation-Based Evaluations
Seoyeong Ahn, Ayoung Kim, Yeonseung Chung, Cinoo Kang, Sooyoung Kim, Dohoon Kwon, Jiwoo Park, Jieun Oh, Jinah Park, Jeongmin Moon, Insung Song, Jieun Min, Hyung Joo Lee, Ho Kim and Whanhee Lee*,
{"title":"Nationwide Machine Learning-Ensemble PM2.5 Mapping Prediction and Forecasting Models in South Korea with High Spatiotemporal Resolution and Health Risk Estimation-Based Evaluations","authors":"Seoyeong Ahn, Ayoung Kim, Yeonseung Chung, Cinoo Kang, Sooyoung Kim, Dohoon Kwon, Jiwoo Park, Jieun Oh, Jinah Park, Jeongmin Moon, Insung Song, Jieun Min, Hyung Joo Lee, Ho Kim and Whanhee Lee*, ","doi":"10.1021/envhealth.4c00201","DOIUrl":null,"url":null,"abstract":"<p >Several <b>s</b>tudies developed machine learning-based PM<sub>2.5</sub> prediction models; however, nationwide models addressing both mapping prediction and forecasting were limited. Further, although the prediction accuracy is different from PM<sub>2.5</sub>-related health risk estimation, previous studies solely examined the prediction accuracy. This study suggests a method to assess the statistical properties of PM<sub>2.5</sub>-health risk estimation, which also can be used as a model selection. We used three machine learning algorithms and an ensemble method to construct PM<sub>2.5</sub> mapping prediction (1 km<sup>2</sup>) and two-day forecasting models majorly using satellite-driven data in South Korea (2015–2022). We performed a simulation study to examine the statistical properties of short-term PM<sub>2.5</sub> risk estimation using prediction models. Our ensemble spatial prediction model showed better performance than single algorithms (0.956 test <i>R</i><sup>2</sup>). The range of the <i>R</i><sup>2</sup> values was 0.78–0.98 across the monitoring sites. The average % bias was from 1.403%–1.787% when our mapping models for PM<sub>2.5</sub>-mortality risk estimation, compared to the estimates from monitored PM<sub>2.5</sub>. The best <i>R</i><sup>2</sup> of our forecasting models was 0.904. This study developed machine learning models for spatial PM<sub>2.5</sub> predictions and forecasting in Korea. This study also suggested a method to address risk estimation and model selection concurrently when multiple prediction models were used.</p>","PeriodicalId":29795,"journal":{"name":"Environment & Health","volume":"3 8","pages":"878–887"},"PeriodicalIF":6.3000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/envhealth.4c00201","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environment & Health","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/envhealth.4c00201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Several studies developed machine learning-based PM2.5 prediction models; however, nationwide models addressing both mapping prediction and forecasting were limited. Further, although the prediction accuracy is different from PM2.5-related health risk estimation, previous studies solely examined the prediction accuracy. This study suggests a method to assess the statistical properties of PM2.5-health risk estimation, which also can be used as a model selection. We used three machine learning algorithms and an ensemble method to construct PM2.5 mapping prediction (1 km2) and two-day forecasting models majorly using satellite-driven data in South Korea (2015–2022). We performed a simulation study to examine the statistical properties of short-term PM2.5 risk estimation using prediction models. Our ensemble spatial prediction model showed better performance than single algorithms (0.956 test R2). The range of the R2 values was 0.78–0.98 across the monitoring sites. The average % bias was from 1.403%–1.787% when our mapping models for PM2.5-mortality risk estimation, compared to the estimates from monitored PM2.5. The best R2 of our forecasting models was 0.904. This study developed machine learning models for spatial PM2.5 predictions and forecasting in Korea. This study also suggested a method to address risk estimation and model selection concurrently when multiple prediction models were used.
期刊介绍:
Environment & Health a peer-reviewed open access journal is committed to exploring the relationship between the environment and human health.As a premier journal for multidisciplinary research Environment & Health reports the health consequences for individuals and communities of changing and hazardous environmental factors. In supporting the UN Sustainable Development Goals the journal aims to help formulate policies to create a healthier world.Topics of interest include but are not limited to:Air water and soil pollutionExposomicsEnvironmental epidemiologyInnovative analytical methodology and instrumentation (multi-omics non-target analysis effect-directed analysis high-throughput screening etc.)Environmental toxicology (endocrine disrupting effect neurotoxicity alternative toxicology computational toxicology epigenetic toxicology etc.)Environmental microbiology pathogen and environmental transmission mechanisms of diseasesEnvironmental modeling bioinformatics and artificial intelligenceEmerging contaminants (including plastics engineered nanomaterials etc.)Climate change and related health effectHealth impacts of energy evolution and carbon neutralizationFood and drinking water safetyOccupational exposure and medicineInnovations in environmental technologies for better healthPolicies and international relations concerned with environmental health