利用包裹损失和神经网络减少年度数据漂移，增强PM2.5预测。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-02-11 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0314327

Md Khalid Hossen, Yan-Tsung Peng, Meng Chang Chen

{"title":"利用包裹损失和神经网络减少年度数据漂移，增强PM2.5预测。","authors":"Md Khalid Hossen, Yan-Tsung Peng, Meng Chang Chen","doi":"10.1371/journal.pone.0314327","DOIUrl":null,"url":null,"abstract":"In many deep learning tasks, it is assumed that the data used in the training process is sampled from the same distribution. However, this may not be accurate for data collected from different contexts or during different periods. For instance, the temperatures in a city can vary from year to year due to various unclear reasons. In this paper, we utilized three distinct statistical techniques to analyze annual data drifting at various stations. These techniques calculate the P values for each station by comparing data from five years (2014-2018) to identify data drifting phenomena. To find out the data drifting scenario those statistical techniques and calculate the P value from those techniques to measure the data drifting in specific locations. From those statistical techniques, the highest drifting stations can be identified from the previous year's datasets To identify data drifting and highlight areas with significant drift, we utilized meteorological air quality and weather data in this study. We proposed two models that consider the characteristics of data drifting for PM2.5 prediction and compared them with various deep learning models, such as Long Short-Term Memory (LSTM) and its variants, for predictions from the next hour to the 64th hour. Our proposed models significantly outperform traditional neural networks. Additionally, we introduced a wrapped loss function incorporated into a model, resulting in more accurate results compared to those using the original loss function alone and prediction has been evaluated by RMSE, MAE and MAPE metrics. The proposed Front-loaded connection model(FLC) and Back-loaded connection model (BLC) solve the data drifting issue and the wrap loss function also help alleviate the data drifting problem with model training and works for the neural network models to achieve more accurate results. Eventually, the experimental results have shown that the proposed model performance enhanced from 24.1% -16%, 12%-8.3% respectively at 1h-24h, 32h-64h with compared to baselines BILSTM model, by 24.6% -11.8%, 10%-10.2% respectively at 1h-24h, 32h-64h compared to CNN model in hourly PM2.5 predictions.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 2","pages":"e0314327"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11813127/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing PM2.5 prediction by mitigating annual data drift using wrapped loss and neural networks.\",\"authors\":\"Md Khalid Hossen, Yan-Tsung Peng, Meng Chang Chen\",\"doi\":\"10.1371/journal.pone.0314327\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many deep learning tasks, it is assumed that the data used in the training process is sampled from the same distribution. However, this may not be accurate for data collected from different contexts or during different periods. For instance, the temperatures in a city can vary from year to year due to various unclear reasons. In this paper, we utilized three distinct statistical techniques to analyze annual data drifting at various stations. These techniques calculate the P values for each station by comparing data from five years (2014-2018) to identify data drifting phenomena. To find out the data drifting scenario those statistical techniques and calculate the P value from those techniques to measure the data drifting in specific locations. From those statistical techniques, the highest drifting stations can be identified from the previous year's datasets To identify data drifting and highlight areas with significant drift, we utilized meteorological air quality and weather data in this study. We proposed two models that consider the characteristics of data drifting for PM2.5 prediction and compared them with various deep learning models, such as Long Short-Term Memory (LSTM) and its variants, for predictions from the next hour to the 64th hour. Our proposed models significantly outperform traditional neural networks. Additionally, we introduced a wrapped loss function incorporated into a model, resulting in more accurate results compared to those using the original loss function alone and prediction has been evaluated by RMSE, MAE and MAPE metrics. The proposed Front-loaded connection model(FLC) and Back-loaded connection model (BLC) solve the data drifting issue and the wrap loss function also help alleviate the data drifting problem with model training and works for the neural network models to achieve more accurate results. Eventually, the experimental results have shown that the proposed model performance enhanced from 24.1% -16%, 12%-8.3% respectively at 1h-24h, 32h-64h with compared to baselines BILSTM model, by 24.6% -11.8%, 10%-10.2% respectively at 1h-24h, 32h-64h compared to CNN model in hourly PM2.5 predictions.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 2\",\"pages\":\"e0314327\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11813127/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0314327\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0314327","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

在许多深度学习任务中，假设训练过程中使用的数据来自相同的分布。但是，对于从不同背景或不同时期收集的数据，这可能不准确。例如，由于各种不明确的原因，一个城市的温度每年都在变化。本文利用三种不同的统计技术对不同站点的年漂移数据进行了分析。这些技术通过比较5年（2014-2018年）的数据来计算每个站点的P值，以识别数据漂移现象。利用这些统计技术找出数据漂移的场景，并从这些技术中计算P值来测量特定位置的数据漂移。为了识别数据漂移和突出漂移显著的区域，本研究利用了气象空气质量和天气数据。我们提出了两个考虑PM2.5预测数据漂移特征的模型，并将它们与各种深度学习模型（如长短期记忆（LSTM）及其变体）进行比较，以预测从下一个小时到第64小时的预测。我们提出的模型明显优于传统的神经网络。此外，我们在模型中引入了一个包裹损失函数，与单独使用原始损失函数的结果相比，结果更加准确，并通过RMSE、MAE和MAPE指标对预测进行了评估。所提出的前加载连接模型（FLC）和后加载连接模型（BLC）解决了数据漂移问题，包络损失函数也有助于通过模型训练缓解数据漂移问题，使神经网络模型得到更准确的结果。最终，实验结果表明，与基线BILSTM模型相比，该模型在1h-24h、32h-64h的逐时PM2.5预测性能分别提高了24.1% -16%、12%-8.3%，在1h-24h、32h-64h的逐时PM2.5预测性能分别比CNN模型提高了24.6% -11.8%、10%-10.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Enhancing PM2.5 prediction by mitigating annual data drift using wrapped loss and neural networks.

查看原文本刊更多论文

Enhancing PM2.5 prediction by mitigating annual data drift using wrapped loss and neural networks.

In many deep learning tasks, it is assumed that the data used in the training process is sampled from the same distribution. However, this may not be accurate for data collected from different contexts or during different periods. For instance, the temperatures in a city can vary from year to year due to various unclear reasons. In this paper, we utilized three distinct statistical techniques to analyze annual data drifting at various stations. These techniques calculate the P values for each station by comparing data from five years (2014-2018) to identify data drifting phenomena. To find out the data drifting scenario those statistical techniques and calculate the P value from those techniques to measure the data drifting in specific locations. From those statistical techniques, the highest drifting stations can be identified from the previous year's datasets To identify data drifting and highlight areas with significant drift, we utilized meteorological air quality and weather data in this study. We proposed two models that consider the characteristics of data drifting for PM2.5 prediction and compared them with various deep learning models, such as Long Short-Term Memory (LSTM) and its variants, for predictions from the next hour to the 64th hour. Our proposed models significantly outperform traditional neural networks. Additionally, we introduced a wrapped loss function incorporated into a model, resulting in more accurate results compared to those using the original loss function alone and prediction has been evaluated by RMSE, MAE and MAPE metrics. The proposed Front-loaded connection model(FLC) and Back-loaded connection model (BLC) solve the data drifting issue and the wrap loss function also help alleviate the data drifting problem with model training and works for the neural network models to achieve more accurate results. Eventually, the experimental results have shown that the proposed model performance enhanced from 24.1% -16%, 12%-8.3% respectively at 1h-24h, 32h-64h with compared to baselines BILSTM model, by 24.6% -11.8%, 10%-10.2% respectively at 1h-24h, 32h-64h compared to CNN model in hourly PM2.5 predictions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage