提取区域和时间特征,改进针对印度城市每小时空气污染物的机器学习

IF 4.2 2区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Shuai Wang , Mengyuan Zhang , Hui Zhao , Peng Wang , Sri Harsha Kota , Qingyan Fu , Hongliang Zhang
{"title":"提取区域和时间特征,改进针对印度城市每小时空气污染物的机器学习","authors":"Shuai Wang ,&nbsp;Mengyuan Zhang ,&nbsp;Hui Zhao ,&nbsp;Peng Wang ,&nbsp;Sri Harsha Kota ,&nbsp;Qingyan Fu ,&nbsp;Hongliang Zhang","doi":"10.1016/j.atmosenv.2024.120834","DOIUrl":null,"url":null,"abstract":"<div><p>India is suffering from severe particulate matter (PM, including PM<sub>2.5</sub> and PM<sub>10</sub>) pollution, while limited ground observations are insufficient to support a comprehensive understanding of its health risks. Machine learning (ML) has the potential to improve the estimation of PM distribution and exposure efficiently. Regional transport as well as accumulation and dispersion processes of PM and its components, which have significant impacts on PM concentrations, are crucial when building ML models, especially for sparsely observed regions like India. Here, geographic and temporal-rolling weighting methods were used to separately extract regional and temporal features for improving the performance of the ML model. The incorporation of temporal and regional features into the ML model significantly improved ML model performance, with root mean square error (RMSE) reduced by 21 % and 19% for PM<sub>2.5</sub> and PM<sub>10</sub> estimation, as well as an improvement in model underestimation for the heavy pollution scenarios. The spatial-temporal model shows out-of-sample test CV coefficients of determination (R<sup>2</sup>) of 0.87 and 0.88 for hourly PM<sub>2.5</sub> and PM<sub>10</sub>. The ML model predicts an annual nationwide concentration of 68.3 μg/m<sup>3</sup> for PM<sub>2.5</sub> with a north (high, especially in Indo-Gangetic Plain) to south (low) distribution, which is consistent with high satellite aerosol optical depth (AOD) values. Boundary layer height is identified as the main meteorological factor influencing PM<sub>2.5</sub> concentrations in winter. Characterizing the regional transport and cumulative dispersion processes of pollutants by extracting features can help in machine learning training, and this method can be further improved and applied to other studies.</p></div>","PeriodicalId":250,"journal":{"name":"Atmospheric Environment","volume":"338 ","pages":"Article 120834"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extracting regional and temporal features to improve machine learning for hourly air pollutants in urban India\",\"authors\":\"Shuai Wang ,&nbsp;Mengyuan Zhang ,&nbsp;Hui Zhao ,&nbsp;Peng Wang ,&nbsp;Sri Harsha Kota ,&nbsp;Qingyan Fu ,&nbsp;Hongliang Zhang\",\"doi\":\"10.1016/j.atmosenv.2024.120834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>India is suffering from severe particulate matter (PM, including PM<sub>2.5</sub> and PM<sub>10</sub>) pollution, while limited ground observations are insufficient to support a comprehensive understanding of its health risks. Machine learning (ML) has the potential to improve the estimation of PM distribution and exposure efficiently. Regional transport as well as accumulation and dispersion processes of PM and its components, which have significant impacts on PM concentrations, are crucial when building ML models, especially for sparsely observed regions like India. Here, geographic and temporal-rolling weighting methods were used to separately extract regional and temporal features for improving the performance of the ML model. The incorporation of temporal and regional features into the ML model significantly improved ML model performance, with root mean square error (RMSE) reduced by 21 % and 19% for PM<sub>2.5</sub> and PM<sub>10</sub> estimation, as well as an improvement in model underestimation for the heavy pollution scenarios. The spatial-temporal model shows out-of-sample test CV coefficients of determination (R<sup>2</sup>) of 0.87 and 0.88 for hourly PM<sub>2.5</sub> and PM<sub>10</sub>. The ML model predicts an annual nationwide concentration of 68.3 μg/m<sup>3</sup> for PM<sub>2.5</sub> with a north (high, especially in Indo-Gangetic Plain) to south (low) distribution, which is consistent with high satellite aerosol optical depth (AOD) values. Boundary layer height is identified as the main meteorological factor influencing PM<sub>2.5</sub> concentrations in winter. Characterizing the regional transport and cumulative dispersion processes of pollutants by extracting features can help in machine learning training, and this method can be further improved and applied to other studies.</p></div>\",\"PeriodicalId\":250,\"journal\":{\"name\":\"Atmospheric Environment\",\"volume\":\"338 \",\"pages\":\"Article 120834\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atmospheric Environment\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1352231024005090\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Environment","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1352231024005090","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

印度正在遭受严重的颗粒物(PM,包括 PM2.5 和 PM10)污染,而有限的地面观测不足以支持对其健康风险的全面了解。机器学习(ML)有可能有效改善对可吸入颗粒物分布和暴露的估计。可吸入颗粒物及其成分的区域传输、累积和扩散过程对可吸入颗粒物的浓度有重大影响,因此在建立 ML 模型时至关重要,特别是对于像印度这样观测稀少的地区。这里使用了地理和时间滚动加权方法来分别提取区域和时间特征,以提高 ML 模型的性能。将时间和区域特征纳入 ML 模型后,ML 模型的性能显著提高,PM2.5 和 PM10 估计的均方根误差(RMSE)分别降低了 21% 和 19%,重污染情景下的模型低估也有所改善。时空模型显示,每小时 PM2.5 和 PM10 的样本外测试 CV 决定系数(R2)分别为 0.87 和 0.88。ML 模型预测全国 PM2.5 的年浓度为 68.3 μg/m3 ,从北(高,尤其是在印度-甘肃平原)到南(低)分布,这与高卫星气溶胶光学深度(AOD)值一致。边界层高度被认为是影响冬季 PM2.5 浓度的主要气象因素。通过提取特征来描述污染物的区域传输和累积扩散过程有助于机器学习训练,该方法可进一步改进并应用于其他研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Extracting regional and temporal features to improve machine learning for hourly air pollutants in urban India

India is suffering from severe particulate matter (PM, including PM2.5 and PM10) pollution, while limited ground observations are insufficient to support a comprehensive understanding of its health risks. Machine learning (ML) has the potential to improve the estimation of PM distribution and exposure efficiently. Regional transport as well as accumulation and dispersion processes of PM and its components, which have significant impacts on PM concentrations, are crucial when building ML models, especially for sparsely observed regions like India. Here, geographic and temporal-rolling weighting methods were used to separately extract regional and temporal features for improving the performance of the ML model. The incorporation of temporal and regional features into the ML model significantly improved ML model performance, with root mean square error (RMSE) reduced by 21 % and 19% for PM2.5 and PM10 estimation, as well as an improvement in model underestimation for the heavy pollution scenarios. The spatial-temporal model shows out-of-sample test CV coefficients of determination (R2) of 0.87 and 0.88 for hourly PM2.5 and PM10. The ML model predicts an annual nationwide concentration of 68.3 μg/m3 for PM2.5 with a north (high, especially in Indo-Gangetic Plain) to south (low) distribution, which is consistent with high satellite aerosol optical depth (AOD) values. Boundary layer height is identified as the main meteorological factor influencing PM2.5 concentrations in winter. Characterizing the regional transport and cumulative dispersion processes of pollutants by extracting features can help in machine learning training, and this method can be further improved and applied to other studies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Atmospheric Environment
Atmospheric Environment 环境科学-环境科学
CiteScore
9.40
自引率
8.00%
发文量
458
审稿时长
53 days
期刊介绍: Atmospheric Environment has an open access mirror journal Atmospheric Environment: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review. Atmospheric Environment is the international journal for scientists in different disciplines related to atmospheric composition and its impacts. The journal publishes scientific articles with atmospheric relevance of emissions and depositions of gaseous and particulate compounds, chemical processes and physical effects in the atmosphere, as well as impacts of the changing atmospheric composition on human health, air quality, climate change, and ecosystems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信