机器学习算法在室内住宅PM2.5浓度预测中的应用

IF 3.9 3区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES
Renato Camilleri , Roy M. Harrison , Noel J. Aquilina
{"title":"机器学习算法在室内住宅PM2.5浓度预测中的应用","authors":"Renato Camilleri ,&nbsp;Roy M. Harrison ,&nbsp;Noel J. Aquilina","doi":"10.1016/j.apr.2025.102609","DOIUrl":null,"url":null,"abstract":"<div><div>Recently Machine Learning (ML) has been amply used in environmental research for prediction purposes, but only a limited number of studies have been employed to predict indoor residential fine particulate matter, PM<sub>2.5</sub> concentrations. PM<sub>2.5</sub> can penetrate deep into the lungs and has been linked to respiratory and cardiovascular problems, with long term exposure associated with increased morbidity and mortality. The use of ML can provide a better estimate of residential PM<sub>2.5</sub> concentrations which usually is a significant contributor to personal exposure, especially for the elderly and those with pre-existing health conditions who tend to spend most of their time inside their homes. This study used ML algorithms (General Linear Model (GLM) with Lasso regularisation and Tree-based algorithms, RF and XGBoost) to predict indoor PM<sub>2.5</sub> concentrations at six-hourly averages in the Maltese Islands using outdoor residential PM concentrations and several meteorological parameters. Continuous PM sampling using aerosol spectrometers was carried out at six non-smoking residences in Malta and Gozo. A repeated 10-fold cross-validation was carried out on the training dataset, with hyperparameter tuning using grid search. Hyperparameter tuning used the Root Mean Square Error (RMSE) as the evaluation metric. Five sampling sites showed low indoor PM contributions and the GLM for these sites showed good performance indicators for the testing data, but serial correlation at lag-1 was recorded. For these sites, RF and XGBoost showed very good performance indicators with an Index of Agreement (IOA) of 0.92 and 0.93, respectively, with the most important predictor variable being the outdoor PM<sub>1</sub> fraction. The RF regression model gave the lowest RMSE (30.65 μg m<sup>−3</sup>) and the highest index of agreement (IOA) (0.66) when the models were tested with the data from all sampling sites, which included a site with a PM<sub>2.5</sub> I/O ratio of 5.2, where the high indoor PM generation was primarily associated with emissions from cooking and the indoor relative humidity was suggested as a good predictor variable for such a scenario. This study showed the significant impact of outdoor PM<sub>1</sub> on indoor PM<sub>2.5</sub> levels at sites with limited indoor fine PM sources. At sites with significant indoor generation from cooking, indoor PM<sub>2.5</sub> was 3.6 times the short-term (24-h) AQG of the WHO, indicating that regulations on extraction systems for domestic kitchens would minimise very high exposures of home dwellers to indoor fine PM.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 10","pages":"Article 102609"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of machine learning algorithms in predicting indoor residential PM2.5 concentrations\",\"authors\":\"Renato Camilleri ,&nbsp;Roy M. Harrison ,&nbsp;Noel J. Aquilina\",\"doi\":\"10.1016/j.apr.2025.102609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently Machine Learning (ML) has been amply used in environmental research for prediction purposes, but only a limited number of studies have been employed to predict indoor residential fine particulate matter, PM<sub>2.5</sub> concentrations. PM<sub>2.5</sub> can penetrate deep into the lungs and has been linked to respiratory and cardiovascular problems, with long term exposure associated with increased morbidity and mortality. The use of ML can provide a better estimate of residential PM<sub>2.5</sub> concentrations which usually is a significant contributor to personal exposure, especially for the elderly and those with pre-existing health conditions who tend to spend most of their time inside their homes. This study used ML algorithms (General Linear Model (GLM) with Lasso regularisation and Tree-based algorithms, RF and XGBoost) to predict indoor PM<sub>2.5</sub> concentrations at six-hourly averages in the Maltese Islands using outdoor residential PM concentrations and several meteorological parameters. Continuous PM sampling using aerosol spectrometers was carried out at six non-smoking residences in Malta and Gozo. A repeated 10-fold cross-validation was carried out on the training dataset, with hyperparameter tuning using grid search. Hyperparameter tuning used the Root Mean Square Error (RMSE) as the evaluation metric. Five sampling sites showed low indoor PM contributions and the GLM for these sites showed good performance indicators for the testing data, but serial correlation at lag-1 was recorded. For these sites, RF and XGBoost showed very good performance indicators with an Index of Agreement (IOA) of 0.92 and 0.93, respectively, with the most important predictor variable being the outdoor PM<sub>1</sub> fraction. The RF regression model gave the lowest RMSE (30.65 μg m<sup>−3</sup>) and the highest index of agreement (IOA) (0.66) when the models were tested with the data from all sampling sites, which included a site with a PM<sub>2.5</sub> I/O ratio of 5.2, where the high indoor PM generation was primarily associated with emissions from cooking and the indoor relative humidity was suggested as a good predictor variable for such a scenario. This study showed the significant impact of outdoor PM<sub>1</sub> on indoor PM<sub>2.5</sub> levels at sites with limited indoor fine PM sources. At sites with significant indoor generation from cooking, indoor PM<sub>2.5</sub> was 3.6 times the short-term (24-h) AQG of the WHO, indicating that regulations on extraction systems for domestic kitchens would minimise very high exposures of home dwellers to indoor fine PM.</div></div>\",\"PeriodicalId\":8604,\"journal\":{\"name\":\"Atmospheric Pollution Research\",\"volume\":\"16 10\",\"pages\":\"Article 102609\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atmospheric Pollution Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1309104225002119\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225002119","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

最近,机器学习(ML)在环境研究中被广泛用于预测目的,但只有有限的研究被用于预测室内住宅细颗粒物,PM2.5浓度。PM2.5可以深入肺部,与呼吸系统和心血管疾病有关,长期接触会增加发病率和死亡率。使用ML可以更好地估计住宅PM2.5浓度,这通常是个人暴露的重要因素,特别是对于老年人和那些大部分时间呆在家里的已有健康问题的人。本研究使用ML算法(一般线性模型(GLM)与Lasso正则化和基于树的算法,RF和XGBoost),利用室外住宅PM浓度和几个气象参数,预测马耳他群岛六小时平均室内PM2.5浓度。在马耳他和戈佐岛的六个非吸烟住宅使用气溶胶光谱仪进行了连续的PM采样。在训练数据集上进行了重复的10倍交叉验证,并使用网格搜索进行了超参数调整。超参数调优使用均方根误差(RMSE)作为评估指标。5个采样点的室内PM贡献较低,GLM对测试数据表现出良好的性能指标,但在lag-1处记录了序列相关性。对于这些站点,RF和XGBoost表现出非常好的性能指标,其一致性指数(IOA)分别为0.92和0.93,最重要的预测变量是室外PM1分数。在所有采样点(包括PM2.5 I/O比为5.2的采样点)对模型进行测试时,RF回归模型的RMSE最低(30.65 μ m−3),一致性指数(IOA)最高(0.66),其中室内PM的高生成主要与烹饪排放有关,室内相对湿度被认为是这种情况的良好预测变量。本研究表明,在室内细颗粒物源有限的场所,室外PM1对室内PM2.5水平有显著影响。在室内烹饪产生大量PM2.5的场所,室内PM2.5是世卫组织短期(24小时)空气质量指标的3.6倍,这表明对家庭厨房提取系统的规定将最大限度地减少家庭居民对室内细颗粒物的高暴露。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Application of machine learning algorithms in predicting indoor residential PM2.5 concentrations

Application of machine learning algorithms in predicting indoor residential PM2.5 concentrations
Recently Machine Learning (ML) has been amply used in environmental research for prediction purposes, but only a limited number of studies have been employed to predict indoor residential fine particulate matter, PM2.5 concentrations. PM2.5 can penetrate deep into the lungs and has been linked to respiratory and cardiovascular problems, with long term exposure associated with increased morbidity and mortality. The use of ML can provide a better estimate of residential PM2.5 concentrations which usually is a significant contributor to personal exposure, especially for the elderly and those with pre-existing health conditions who tend to spend most of their time inside their homes. This study used ML algorithms (General Linear Model (GLM) with Lasso regularisation and Tree-based algorithms, RF and XGBoost) to predict indoor PM2.5 concentrations at six-hourly averages in the Maltese Islands using outdoor residential PM concentrations and several meteorological parameters. Continuous PM sampling using aerosol spectrometers was carried out at six non-smoking residences in Malta and Gozo. A repeated 10-fold cross-validation was carried out on the training dataset, with hyperparameter tuning using grid search. Hyperparameter tuning used the Root Mean Square Error (RMSE) as the evaluation metric. Five sampling sites showed low indoor PM contributions and the GLM for these sites showed good performance indicators for the testing data, but serial correlation at lag-1 was recorded. For these sites, RF and XGBoost showed very good performance indicators with an Index of Agreement (IOA) of 0.92 and 0.93, respectively, with the most important predictor variable being the outdoor PM1 fraction. The RF regression model gave the lowest RMSE (30.65 μg m−3) and the highest index of agreement (IOA) (0.66) when the models were tested with the data from all sampling sites, which included a site with a PM2.5 I/O ratio of 5.2, where the high indoor PM generation was primarily associated with emissions from cooking and the indoor relative humidity was suggested as a good predictor variable for such a scenario. This study showed the significant impact of outdoor PM1 on indoor PM2.5 levels at sites with limited indoor fine PM sources. At sites with significant indoor generation from cooking, indoor PM2.5 was 3.6 times the short-term (24-h) AQG of the WHO, indicating that regulations on extraction systems for domestic kitchens would minimise very high exposures of home dwellers to indoor fine PM.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Atmospheric Pollution Research
Atmospheric Pollution Research ENVIRONMENTAL SCIENCES-
CiteScore
8.30
自引率
6.70%
发文量
256
审稿时长
36 days
期刊介绍: Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信