Phuong D. M. Nguyen, An H. Phan, Truong X. Ngo, Bang Q. Ho, Tran Vu Pham, Thanh T. N. Nguyen
{"title":"Mapping of high-resolution daily particulate matter (PM2.5) concentration at the city level through a machine learning-based downscaling approach","authors":"Phuong D. M. Nguyen, An H. Phan, Truong X. Ngo, Bang Q. Ho, Tran Vu Pham, Thanh T. N. Nguyen","doi":"10.1007/s10661-024-13562-6","DOIUrl":null,"url":null,"abstract":"<p>PM<sub>2.5</sub> pollution is a major global concern, especially in Vietnam, due to its harmful effects on health and the environment. Monitoring local PM<sub>2.5</sub> levels is crucial for assessing air quality. However, Vietnam’s state-of-the-art (SOTA) dataset with a 3 km resolution needs to be revised to depict spatial variation in smaller regions accurately. In this research, we investigated machine learning-based downscaling methods to improve the spatial resolution and quality of Vietnam’s existing 3 km PM<sub>2.5</sub> products using different approaches: traditional machine learning models (random forest, XGBoost, Catboost, support vector regression (SVR), mixed effect model (MEM)) and deep learning models (long short-term memory (LSTM), convolutional neural network (CNN), convolutional LSTM (ConvLSTM)). Overall, the CatBoost 2-day lag model exhibited superior performance. In terms of modeling, integrating temporal factors into tree-based models can enhance predictive accuracy. Furthermore, when faced with small datasets, traditional machine learning models demonstrate superior performance over complex deep learning approaches. The validation of machine and deep learning models based on their PM<sub>2.5</sub> generated maps is requested because these models can obtain very high results for model evaluation but are unrealistic for application. In this study, compared to the state-of-the-art (SOTA) PM<sub>2.5</sub> maps in Vietnam and the SOTA global maps, the proposed CatBoost 2-day lag model’s maps showed a 57% increase in the correlation coefficient (Pearson R), as well as 42–73%, 28–75%, and 39–75% reductions in root mean squared error (RMSE), mean relative error (MRE), and mean absolute error (MAE), respectively. Additionally, the daily, monthly, and year-average maps generated by the Catboost 2-day lag model effectively capture the spatial distribution and seasonal variations of PM<sub>2.5</sub> in Ho Chi Minh City. These findings indicate a substantial enhancement in the accuracy and reliability of downscaled PM<sub>2.5</sub> maps.</p>","PeriodicalId":544,"journal":{"name":"Environmental Monitoring and Assessment","volume":"197 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Monitoring and Assessment","FirstCategoryId":"93","ListUrlMain":"https://link.springer.com/article/10.1007/s10661-024-13562-6","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
PM2.5 pollution is a major global concern, especially in Vietnam, due to its harmful effects on health and the environment. Monitoring local PM2.5 levels is crucial for assessing air quality. However, Vietnam’s state-of-the-art (SOTA) dataset with a 3 km resolution needs to be revised to depict spatial variation in smaller regions accurately. In this research, we investigated machine learning-based downscaling methods to improve the spatial resolution and quality of Vietnam’s existing 3 km PM2.5 products using different approaches: traditional machine learning models (random forest, XGBoost, Catboost, support vector regression (SVR), mixed effect model (MEM)) and deep learning models (long short-term memory (LSTM), convolutional neural network (CNN), convolutional LSTM (ConvLSTM)). Overall, the CatBoost 2-day lag model exhibited superior performance. In terms of modeling, integrating temporal factors into tree-based models can enhance predictive accuracy. Furthermore, when faced with small datasets, traditional machine learning models demonstrate superior performance over complex deep learning approaches. The validation of machine and deep learning models based on their PM2.5 generated maps is requested because these models can obtain very high results for model evaluation but are unrealistic for application. In this study, compared to the state-of-the-art (SOTA) PM2.5 maps in Vietnam and the SOTA global maps, the proposed CatBoost 2-day lag model’s maps showed a 57% increase in the correlation coefficient (Pearson R), as well as 42–73%, 28–75%, and 39–75% reductions in root mean squared error (RMSE), mean relative error (MRE), and mean absolute error (MAE), respectively. Additionally, the daily, monthly, and year-average maps generated by the Catboost 2-day lag model effectively capture the spatial distribution and seasonal variations of PM2.5 in Ho Chi Minh City. These findings indicate a substantial enhancement in the accuracy and reliability of downscaled PM2.5 maps.
期刊介绍:
Environmental Monitoring and Assessment emphasizes technical developments and data arising from environmental monitoring and assessment, the use of scientific principles in the design of monitoring systems at the local, regional and global scales, and the use of monitoring data in assessing the consequences of natural resource management actions and pollution risks to man and the environment.