Jiani Yang , Sina Hasheminassab , Meredith Franklin , Antong Zhang , David J. Diner , Joseph Pinto , Yuk L. Yung
{"title":"使用机器学习预测南加州环境PM2.5化学成分","authors":"Jiani Yang , Sina Hasheminassab , Meredith Franklin , Antong Zhang , David J. Diner , Joseph Pinto , Yuk L. Yung","doi":"10.1016/j.aeaoa.2025.100372","DOIUrl":null,"url":null,"abstract":"<div><div>Fine particulate matter (PM<sub>2.5</sub>, particulate matter with an aerodynamic diameter ≤2.5 μm) poses major public health and environmental risks, yet the toxicity of its chemical components remains poorly understood due to limited chemical speciation data. In this study we apply an extreme gradient boosting (XGBoost) machine learning framework to predict key PM<sub>2.5</sub> components including organic carbon, elemental carbon, nitrate, sulfate, ammonium, and metals, using readily available predictors: total PM<sub>2.5</sub> mass concentrations, meteorological variables, trace gas measurements, and indicators of exceptional events (e.g., wildfires, fireworks). Leveraging a decade of data from two monitoring sites in Southern California (Los Angeles and Rubidoux), the models achieved strong predictive performance, particularly for nitrate, ammonium, and elemental carbon. Among the most influential predictors across components were total PM<sub>2.5</sub> mass, relative humidity, and boundary layer height. This approach has promise for enhancing satellite remote sensing applications, improving chemical transport model inputs, and generating cost-effective estimates of PM<sub>2.5</sub> components during sampling gaps and in regions lacking frequent monitoring. Further research is needed to assess the generalizability of this framework across diverse geographic and climatic settings.</div></div>","PeriodicalId":37150,"journal":{"name":"Atmospheric Environment: X","volume":"28 ","pages":"Article 100372"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of ambient PM2.5 chemical components in Southern California using machine learning\",\"authors\":\"Jiani Yang , Sina Hasheminassab , Meredith Franklin , Antong Zhang , David J. Diner , Joseph Pinto , Yuk L. Yung\",\"doi\":\"10.1016/j.aeaoa.2025.100372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Fine particulate matter (PM<sub>2.5</sub>, particulate matter with an aerodynamic diameter ≤2.5 μm) poses major public health and environmental risks, yet the toxicity of its chemical components remains poorly understood due to limited chemical speciation data. In this study we apply an extreme gradient boosting (XGBoost) machine learning framework to predict key PM<sub>2.5</sub> components including organic carbon, elemental carbon, nitrate, sulfate, ammonium, and metals, using readily available predictors: total PM<sub>2.5</sub> mass concentrations, meteorological variables, trace gas measurements, and indicators of exceptional events (e.g., wildfires, fireworks). Leveraging a decade of data from two monitoring sites in Southern California (Los Angeles and Rubidoux), the models achieved strong predictive performance, particularly for nitrate, ammonium, and elemental carbon. Among the most influential predictors across components were total PM<sub>2.5</sub> mass, relative humidity, and boundary layer height. This approach has promise for enhancing satellite remote sensing applications, improving chemical transport model inputs, and generating cost-effective estimates of PM<sub>2.5</sub> components during sampling gaps and in regions lacking frequent monitoring. Further research is needed to assess the generalizability of this framework across diverse geographic and climatic settings.</div></div>\",\"PeriodicalId\":37150,\"journal\":{\"name\":\"Atmospheric Environment: X\",\"volume\":\"28 \",\"pages\":\"Article 100372\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atmospheric Environment: X\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590162125000620\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Environment: X","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590162125000620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Prediction of ambient PM2.5 chemical components in Southern California using machine learning
Fine particulate matter (PM2.5, particulate matter with an aerodynamic diameter ≤2.5 μm) poses major public health and environmental risks, yet the toxicity of its chemical components remains poorly understood due to limited chemical speciation data. In this study we apply an extreme gradient boosting (XGBoost) machine learning framework to predict key PM2.5 components including organic carbon, elemental carbon, nitrate, sulfate, ammonium, and metals, using readily available predictors: total PM2.5 mass concentrations, meteorological variables, trace gas measurements, and indicators of exceptional events (e.g., wildfires, fireworks). Leveraging a decade of data from two monitoring sites in Southern California (Los Angeles and Rubidoux), the models achieved strong predictive performance, particularly for nitrate, ammonium, and elemental carbon. Among the most influential predictors across components were total PM2.5 mass, relative humidity, and boundary layer height. This approach has promise for enhancing satellite remote sensing applications, improving chemical transport model inputs, and generating cost-effective estimates of PM2.5 components during sampling gaps and in regions lacking frequent monitoring. Further research is needed to assess the generalizability of this framework across diverse geographic and climatic settings.