Renato Camilleri , Roy M. Harrison , Noel J. Aquilina
{"title":"Application of machine learning algorithms in predicting indoor residential PM2.5 concentrations","authors":"Renato Camilleri , Roy M. Harrison , Noel J. Aquilina","doi":"10.1016/j.apr.2025.102609","DOIUrl":null,"url":null,"abstract":"<div><div>Recently Machine Learning (ML) has been amply used in environmental research for prediction purposes, but only a limited number of studies have been employed to predict indoor residential fine particulate matter, PM<sub>2.5</sub> concentrations. PM<sub>2.5</sub> can penetrate deep into the lungs and has been linked to respiratory and cardiovascular problems, with long term exposure associated with increased morbidity and mortality. The use of ML can provide a better estimate of residential PM<sub>2.5</sub> concentrations which usually is a significant contributor to personal exposure, especially for the elderly and those with pre-existing health conditions who tend to spend most of their time inside their homes. This study used ML algorithms (General Linear Model (GLM) with Lasso regularisation and Tree-based algorithms, RF and XGBoost) to predict indoor PM<sub>2.5</sub> concentrations at six-hourly averages in the Maltese Islands using outdoor residential PM concentrations and several meteorological parameters. Continuous PM sampling using aerosol spectrometers was carried out at six non-smoking residences in Malta and Gozo. A repeated 10-fold cross-validation was carried out on the training dataset, with hyperparameter tuning using grid search. Hyperparameter tuning used the Root Mean Square Error (RMSE) as the evaluation metric. Five sampling sites showed low indoor PM contributions and the GLM for these sites showed good performance indicators for the testing data, but serial correlation at lag-1 was recorded. For these sites, RF and XGBoost showed very good performance indicators with an Index of Agreement (IOA) of 0.92 and 0.93, respectively, with the most important predictor variable being the outdoor PM<sub>1</sub> fraction. The RF regression model gave the lowest RMSE (30.65 μg m<sup>−3</sup>) and the highest index of agreement (IOA) (0.66) when the models were tested with the data from all sampling sites, which included a site with a PM<sub>2.5</sub> I/O ratio of 5.2, where the high indoor PM generation was primarily associated with emissions from cooking and the indoor relative humidity was suggested as a good predictor variable for such a scenario. This study showed the significant impact of outdoor PM<sub>1</sub> on indoor PM<sub>2.5</sub> levels at sites with limited indoor fine PM sources. At sites with significant indoor generation from cooking, indoor PM<sub>2.5</sub> was 3.6 times the short-term (24-h) AQG of the WHO, indicating that regulations on extraction systems for domestic kitchens would minimise very high exposures of home dwellers to indoor fine PM.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 10","pages":"Article 102609"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002119","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Recently Machine Learning (ML) has been amply used in environmental research for prediction purposes, but only a limited number of studies have been employed to predict indoor residential fine particulate matter, PM2.5 concentrations. PM2.5 can penetrate deep into the lungs and has been linked to respiratory and cardiovascular problems, with long term exposure associated with increased morbidity and mortality. The use of ML can provide a better estimate of residential PM2.5 concentrations which usually is a significant contributor to personal exposure, especially for the elderly and those with pre-existing health conditions who tend to spend most of their time inside their homes. This study used ML algorithms (General Linear Model (GLM) with Lasso regularisation and Tree-based algorithms, RF and XGBoost) to predict indoor PM2.5 concentrations at six-hourly averages in the Maltese Islands using outdoor residential PM concentrations and several meteorological parameters. Continuous PM sampling using aerosol spectrometers was carried out at six non-smoking residences in Malta and Gozo. A repeated 10-fold cross-validation was carried out on the training dataset, with hyperparameter tuning using grid search. Hyperparameter tuning used the Root Mean Square Error (RMSE) as the evaluation metric. Five sampling sites showed low indoor PM contributions and the GLM for these sites showed good performance indicators for the testing data, but serial correlation at lag-1 was recorded. For these sites, RF and XGBoost showed very good performance indicators with an Index of Agreement (IOA) of 0.92 and 0.93, respectively, with the most important predictor variable being the outdoor PM1 fraction. The RF regression model gave the lowest RMSE (30.65 μg m−3) and the highest index of agreement (IOA) (0.66) when the models were tested with the data from all sampling sites, which included a site with a PM2.5 I/O ratio of 5.2, where the high indoor PM generation was primarily associated with emissions from cooking and the indoor relative humidity was suggested as a good predictor variable for such a scenario. This study showed the significant impact of outdoor PM1 on indoor PM2.5 levels at sites with limited indoor fine PM sources. At sites with significant indoor generation from cooking, indoor PM2.5 was 3.6 times the short-term (24-h) AQG of the WHO, indicating that regulations on extraction systems for domestic kitchens would minimise very high exposures of home dwellers to indoor fine PM.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.