Shuai Wang , Mengyuan Zhang , Hui Zhao , Peng Wang , Sri Harsha Kota , Qingyan Fu , Hongliang Zhang
{"title":"Extracting regional and temporal features to improve machine learning for hourly air pollutants in urban India","authors":"Shuai Wang , Mengyuan Zhang , Hui Zhao , Peng Wang , Sri Harsha Kota , Qingyan Fu , Hongliang Zhang","doi":"10.1016/j.atmosenv.2024.120834","DOIUrl":null,"url":null,"abstract":"<div><p>India is suffering from severe particulate matter (PM, including PM<sub>2.5</sub> and PM<sub>10</sub>) pollution, while limited ground observations are insufficient to support a comprehensive understanding of its health risks. Machine learning (ML) has the potential to improve the estimation of PM distribution and exposure efficiently. Regional transport as well as accumulation and dispersion processes of PM and its components, which have significant impacts on PM concentrations, are crucial when building ML models, especially for sparsely observed regions like India. Here, geographic and temporal-rolling weighting methods were used to separately extract regional and temporal features for improving the performance of the ML model. The incorporation of temporal and regional features into the ML model significantly improved ML model performance, with root mean square error (RMSE) reduced by 21 % and 19% for PM<sub>2.5</sub> and PM<sub>10</sub> estimation, as well as an improvement in model underestimation for the heavy pollution scenarios. The spatial-temporal model shows out-of-sample test CV coefficients of determination (R<sup>2</sup>) of 0.87 and 0.88 for hourly PM<sub>2.5</sub> and PM<sub>10</sub>. The ML model predicts an annual nationwide concentration of 68.3 μg/m<sup>3</sup> for PM<sub>2.5</sub> with a north (high, especially in Indo-Gangetic Plain) to south (low) distribution, which is consistent with high satellite aerosol optical depth (AOD) values. Boundary layer height is identified as the main meteorological factor influencing PM<sub>2.5</sub> concentrations in winter. Characterizing the regional transport and cumulative dispersion processes of pollutants by extracting features can help in machine learning training, and this method can be further improved and applied to other studies.</p></div>","PeriodicalId":250,"journal":{"name":"Atmospheric Environment","volume":"338 ","pages":"Article 120834"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Environment","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1352231024005090","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
India is suffering from severe particulate matter (PM, including PM2.5 and PM10) pollution, while limited ground observations are insufficient to support a comprehensive understanding of its health risks. Machine learning (ML) has the potential to improve the estimation of PM distribution and exposure efficiently. Regional transport as well as accumulation and dispersion processes of PM and its components, which have significant impacts on PM concentrations, are crucial when building ML models, especially for sparsely observed regions like India. Here, geographic and temporal-rolling weighting methods were used to separately extract regional and temporal features for improving the performance of the ML model. The incorporation of temporal and regional features into the ML model significantly improved ML model performance, with root mean square error (RMSE) reduced by 21 % and 19% for PM2.5 and PM10 estimation, as well as an improvement in model underestimation for the heavy pollution scenarios. The spatial-temporal model shows out-of-sample test CV coefficients of determination (R2) of 0.87 and 0.88 for hourly PM2.5 and PM10. The ML model predicts an annual nationwide concentration of 68.3 μg/m3 for PM2.5 with a north (high, especially in Indo-Gangetic Plain) to south (low) distribution, which is consistent with high satellite aerosol optical depth (AOD) values. Boundary layer height is identified as the main meteorological factor influencing PM2.5 concentrations in winter. Characterizing the regional transport and cumulative dispersion processes of pollutants by extracting features can help in machine learning training, and this method can be further improved and applied to other studies.
期刊介绍:
Atmospheric Environment has an open access mirror journal Atmospheric Environment: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review.
Atmospheric Environment is the international journal for scientists in different disciplines related to atmospheric composition and its impacts. The journal publishes scientific articles with atmospheric relevance of emissions and depositions of gaseous and particulate compounds, chemical processes and physical effects in the atmosphere, as well as impacts of the changing atmospheric composition on human health, air quality, climate change, and ecosystems.