Prediction of ambient PM2.5 chemical components in Southern California using machine learning

IF 3.4 Q2 ENVIRONMENTAL SCIENCES
Jiani Yang , Sina Hasheminassab , Meredith Franklin , Antong Zhang , David J. Diner , Joseph Pinto , Yuk L. Yung
{"title":"Prediction of ambient PM2.5 chemical components in Southern California using machine learning","authors":"Jiani Yang ,&nbsp;Sina Hasheminassab ,&nbsp;Meredith Franklin ,&nbsp;Antong Zhang ,&nbsp;David J. Diner ,&nbsp;Joseph Pinto ,&nbsp;Yuk L. Yung","doi":"10.1016/j.aeaoa.2025.100372","DOIUrl":null,"url":null,"abstract":"<div><div>Fine particulate matter (PM<sub>2.5</sub>, particulate matter with an aerodynamic diameter ≤2.5 μm) poses major public health and environmental risks, yet the toxicity of its chemical components remains poorly understood due to limited chemical speciation data. In this study we apply an extreme gradient boosting (XGBoost) machine learning framework to predict key PM<sub>2.5</sub> components including organic carbon, elemental carbon, nitrate, sulfate, ammonium, and metals, using readily available predictors: total PM<sub>2.5</sub> mass concentrations, meteorological variables, trace gas measurements, and indicators of exceptional events (e.g., wildfires, fireworks). Leveraging a decade of data from two monitoring sites in Southern California (Los Angeles and Rubidoux), the models achieved strong predictive performance, particularly for nitrate, ammonium, and elemental carbon. Among the most influential predictors across components were total PM<sub>2.5</sub> mass, relative humidity, and boundary layer height. This approach has promise for enhancing satellite remote sensing applications, improving chemical transport model inputs, and generating cost-effective estimates of PM<sub>2.5</sub> components during sampling gaps and in regions lacking frequent monitoring. Further research is needed to assess the generalizability of this framework across diverse geographic and climatic settings.</div></div>","PeriodicalId":37150,"journal":{"name":"Atmospheric Environment: X","volume":"28 ","pages":"Article 100372"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Environment: X","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590162125000620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Fine particulate matter (PM2.5, particulate matter with an aerodynamic diameter ≤2.5 μm) poses major public health and environmental risks, yet the toxicity of its chemical components remains poorly understood due to limited chemical speciation data. In this study we apply an extreme gradient boosting (XGBoost) machine learning framework to predict key PM2.5 components including organic carbon, elemental carbon, nitrate, sulfate, ammonium, and metals, using readily available predictors: total PM2.5 mass concentrations, meteorological variables, trace gas measurements, and indicators of exceptional events (e.g., wildfires, fireworks). Leveraging a decade of data from two monitoring sites in Southern California (Los Angeles and Rubidoux), the models achieved strong predictive performance, particularly for nitrate, ammonium, and elemental carbon. Among the most influential predictors across components were total PM2.5 mass, relative humidity, and boundary layer height. This approach has promise for enhancing satellite remote sensing applications, improving chemical transport model inputs, and generating cost-effective estimates of PM2.5 components during sampling gaps and in regions lacking frequent monitoring. Further research is needed to assess the generalizability of this framework across diverse geographic and climatic settings.
使用机器学习预测南加州环境PM2.5化学成分
细颗粒物(PM2.5,空气动力学直径≤2.5 μm的颗粒物)构成重大公共健康和环境风险,但由于化学形态数据有限,对其化学成分的毒性仍知之甚少。在本研究中,我们应用极端梯度增强(XGBoost)机器学习框架来预测PM2.5的关键成分,包括有机碳、元素碳、硝酸盐、硫酸盐、铵和金属,使用现成的预测因子:PM2.5总质量浓度、气象变量、痕量气体测量和特殊事件(如野火、烟花)的指标。利用南加州两个监测点(洛杉矶和鲁比杜)十年来的数据,这些模型取得了很强的预测性能,特别是对硝酸盐、铵和元素碳。各分量中影响最大的预测因子是PM2.5总质量、相对湿度和边界层高度。这种方法有望加强卫星遥感应用,改善化学输运模型输入,并在采样间隙和缺乏频繁监测的地区产生具有成本效益的PM2.5成分估算。需要进一步的研究来评估这一框架在不同地理和气候环境下的普遍性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Atmospheric Environment: X
Atmospheric Environment: X Environmental Science-Environmental Science (all)
CiteScore
8.00
自引率
0.00%
发文量
47
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信