A novel hybrid model based on dual-layer decomposition and kernel density estimation for VOCs concentration forecasting considering influencing factors
{"title":"A novel hybrid model based on dual-layer decomposition and kernel density estimation for VOCs concentration forecasting considering influencing factors","authors":"Fan Yang, Guangqiu Huang, Xin Jiao","doi":"10.1016/j.apr.2025.102439","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate VOCs concentration prediction is essential for air pollution control and ecosystem stability. Due to multiple factors such as climatic conditions and photochemical reactions, VOCs monitoring data exhibits high randomness, which poses a challenge for prediction precision. Current decomposition integration models mainly focus on modelling the target variables and pay insufficient attention to the uncertainty of the prediction results. To solve these problems, an innovative VOCs prediction model is proposed by considering multiple external factors and combining dual-layer decomposition and nonlinear integration. Firstly, random forest (RF) is used for feature selection and a dual-layer decomposition method combining complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and improved variational mode decomposition (IVMD) is proposed to reduce the data complexity. Next, K-means clustering is applied to reconstruct the decomposed subsequences to balance computational efficiency and model complexity, and the reconstructed subsequences is fed into long short-term memory (LSTM) optimized by grey wolf optimization (GWO) for prediction. Then, the predicted values are integrated by support vector regression (SVR) to minimize error accumulation. Finally, construct the prediction intervals based on kernel density estimation (KDE) to capture the fluctuation range of VOCs concentration. In the empirical study with total VOCs concentration data from two monitoring stations, the proposed model exhibits the lowest prediction error, with the root mean square error reduced by a maximum of 85.59% and 86.97%, respectively. The prediction intervals have high coverage and narrow interval width, proving that the proposed model can provide reliable VOCs concentration point and interval prediction.</div></div>","PeriodicalId":8604,"journal":{"name":"Atmospheric Pollution Research","volume":"16 4","pages":"Article 102439"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atmospheric Pollution Research","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1309104225000418","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate VOCs concentration prediction is essential for air pollution control and ecosystem stability. Due to multiple factors such as climatic conditions and photochemical reactions, VOCs monitoring data exhibits high randomness, which poses a challenge for prediction precision. Current decomposition integration models mainly focus on modelling the target variables and pay insufficient attention to the uncertainty of the prediction results. To solve these problems, an innovative VOCs prediction model is proposed by considering multiple external factors and combining dual-layer decomposition and nonlinear integration. Firstly, random forest (RF) is used for feature selection and a dual-layer decomposition method combining complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and improved variational mode decomposition (IVMD) is proposed to reduce the data complexity. Next, K-means clustering is applied to reconstruct the decomposed subsequences to balance computational efficiency and model complexity, and the reconstructed subsequences is fed into long short-term memory (LSTM) optimized by grey wolf optimization (GWO) for prediction. Then, the predicted values are integrated by support vector regression (SVR) to minimize error accumulation. Finally, construct the prediction intervals based on kernel density estimation (KDE) to capture the fluctuation range of VOCs concentration. In the empirical study with total VOCs concentration data from two monitoring stations, the proposed model exhibits the lowest prediction error, with the root mean square error reduced by a maximum of 85.59% and 86.97%, respectively. The prediction intervals have high coverage and narrow interval width, proving that the proposed model can provide reliable VOCs concentration point and interval prediction.
期刊介绍:
Atmospheric Pollution Research (APR) is an international journal designed for the publication of articles on air pollution. Papers should present novel experimental results, theory and modeling of air pollution on local, regional, or global scales. Areas covered are research on inorganic, organic, and persistent organic air pollutants, air quality monitoring, air quality management, atmospheric dispersion and transport, air-surface (soil, water, and vegetation) exchange of pollutants, dry and wet deposition, indoor air quality, exposure assessment, health effects, satellite measurements, natural emissions, atmospheric chemistry, greenhouse gases, and effects on climate change.