Machine learning for air quality prediction and data analysis: Review on recent advancements, challenges, and outlooks.

IF 8 1区环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES

Science of the Total Environment Pub Date : 2025-09-29 DOI:10.1016/j.scitotenv.2025.180593

Manal Karmoude, Brenton Munhungewarwa, Isaiah Chiraira, Ryan Mckenzie, Jude Kong, Bevan Smith, Gelan Ayana, Nkosiphendule Njara, Thuso Mathaha, Mukesh Kumar, Bruce Mellado

{"title":"Machine learning for air quality prediction and data analysis: Review on recent advancements, challenges, and outlooks.","authors":"Manal Karmoude, Brenton Munhungewarwa, Isaiah Chiraira, Ryan Mckenzie, Jude Kong, Bevan Smith, Gelan Ayana, Nkosiphendule Njara, Thuso Mathaha, Mukesh Kumar, Bruce Mellado","doi":"10.1016/j.scitotenv.2025.180593","DOIUrl":null,"url":null,"abstract":"<p><p>Air quality is a critical determinant of human health, with severe consequences resulting from air pollution. The growing necessity for air quality monitoring has led to the adoption of IoT sensor networks, which provide real-time data for forecasting, issuing warnings, and informing public health interventions. In this context, machine learning (ML) algorithms have proven to be powerful tools for enhancing air quality prediction and addressing monitoring challenges. However, a comprehensive review compiling the research space of ML for air quality is seldom available. This review analyzes over 70 recent studies that apply ML techniques to air quality monitoring, categorizing them based on the type of learning approach employed, with a focus on identifying the most effective algorithms in each category. The findings demonstrate that ensemble models such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost) consistently achieve high accuracy in structured datasets, while deep learning (DL) approaches like Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) excel in capturing temporal dependencies and spatial patterns in pollution forecasting. Unsupervised approaches like clustering and anomaly detection effectively enhance data quality and sensor calibration, whereas reinforcement learning shows promise in adaptive control scenarios, despite challenges related to computational intensity and interpretability. This review is highly significant, offering valuable insights for policymakers and researchers in developing strategies to mitigate air pollution and improve public health using advanced ML techniques.</p>","PeriodicalId":422,"journal":{"name":"Science of the Total Environment","volume":"1002 ","pages":"180593"},"PeriodicalIF":8.0000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of the Total Environment","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.scitotenv.2025.180593","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Air quality is a critical determinant of human health, with severe consequences resulting from air pollution. The growing necessity for air quality monitoring has led to the adoption of IoT sensor networks, which provide real-time data for forecasting, issuing warnings, and informing public health interventions. In this context, machine learning (ML) algorithms have proven to be powerful tools for enhancing air quality prediction and addressing monitoring challenges. However, a comprehensive review compiling the research space of ML for air quality is seldom available. This review analyzes over 70 recent studies that apply ML techniques to air quality monitoring, categorizing them based on the type of learning approach employed, with a focus on identifying the most effective algorithms in each category. The findings demonstrate that ensemble models such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost) consistently achieve high accuracy in structured datasets, while deep learning (DL) approaches like Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) excel in capturing temporal dependencies and spatial patterns in pollution forecasting. Unsupervised approaches like clustering and anomaly detection effectively enhance data quality and sensor calibration, whereas reinforcement learning shows promise in adaptive control scenarios, despite challenges related to computational intensity and interpretability. This review is highly significant, offering valuable insights for policymakers and researchers in developing strategies to mitigate air pollution and improve public health using advanced ML techniques.

查看原文本刊更多论文

用于空气质量预测和数据分析的机器学习：最新进展、挑战和展望综述。

空气质量是人类健康的一个关键决定因素，空气污染会造成严重后果。对空气质量监测的需求日益增长，导致采用物联网传感器网络，为预测、发布警报和告知公共卫生干预措施提供实时数据。在这种情况下，机器学习（ML）算法已被证明是增强空气质量预测和解决监测挑战的强大工具。然而，对机器学习在空气质量方面的研究空间进行全面的综述却很少。本文分析了70多项将机器学习技术应用于空气质量监测的最新研究，根据所采用的学习方法类型对它们进行了分类，重点是确定每个类别中最有效的算法。研究结果表明，随机森林（RF）和极端梯度增强（XGBoost）等集成模型在结构化数据集中始终保持高精度，而长短期记忆（LSTM）和卷积神经网络（CNN）等深度学习（DL）方法在捕获污染预测中的时间依赖性和空间模式方面表现出色。聚类和异常检测等无监督方法有效地提高了数据质量和传感器校准，而强化学习在自适应控制场景中显示出前景，尽管存在与计算强度和可解释性相关的挑战。这篇综述非常重要，为政策制定者和研究人员提供了宝贵的见解，以制定利用先进的机器学习技术减轻空气污染和改善公共卫生的战略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science of the Total Environment 环境科学-环境科学

CiteScore

17.60

自引率

10.20%

发文量

8726

审稿时长

2.4 months

期刊介绍： The Science of the Total Environment is an international journal dedicated to scientific research on the environment and its interaction with humanity. It covers a wide range of disciplines and seeks to publish innovative, hypothesis-driven, and impactful research that explores the entire environment, including the atmosphere, lithosphere, hydrosphere, biosphere, and anthroposphere. The journal's updated Aims & Scope emphasizes the importance of interdisciplinary environmental research with broad impact. Priority is given to studies that advance fundamental understanding and explore the interconnectedness of multiple environmental spheres. Field studies are preferred, while laboratory experiments must demonstrate significant methodological advancements or mechanistic insights with direct relevance to the environment.