PM10和O3数据缺失值的插值方法应用:插值法、移动平均法和k近邻法

IF 1.2 Q4 ENVIRONMENTAL SCIENCES

Environmental Health Engineering and Management Journal Pub Date : 2021-09-16 DOI:10.34172/ehem.2021.25

Parisa Saeipourdizaj, P. Sarbakhsh, Akbar Gholampour

{"title":"PM10和O3数据缺失值的插值方法应用:插值法、移动平均法和k近邻法","authors":"Parisa Saeipourdizaj, P. Sarbakhsh, Akbar Gholampour","doi":"10.34172/ehem.2021.25","DOIUrl":null,"url":null,"abstract":"Background: PIn air quality studies, it is very often to have missing data due to reasons such as machine failure or human error. The approach used in dealing with such missing data can affect the results of the analysis. The main aim of this study was to review the types of missing mechanism, imputation methods, application of some of them in imputation of missing of PM10 and O3 in Tabriz, and compare their efficiency. Methods: Methods of mean, EM algorithm, regression, classification and regression tree, predictive mean matching (PMM), interpolation, moving average, and K-nearest neighbor (KNN) were used. PMM was investigated by considering the spatial and temporal dependencies in the model. Missing data were randomly simulated with 10, 20, and 30% missing values. The efficiency of methods was compared using coefficient of determination (R2 ), mean absolute error (MAE) and root mean square error (RMSE). Results: Based on the results for all indicators, interpolation, moving average, and KNN had the best performance, respectively. PMM did not perform well with and without spatio-temporal information. Conclusion: Given that the nature of pollution data always depends on next and previous information, methods that their computational nature is based on before and after information indicated better performance than others, so in the case of pollutant data, it is recommended to use these methods.","PeriodicalId":51877,"journal":{"name":"Environmental Health Engineering and Management Journal","volume":"95 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods\",\"authors\":\"Parisa Saeipourdizaj, P. Sarbakhsh, Akbar Gholampour\",\"doi\":\"10.34172/ehem.2021.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: PIn air quality studies, it is very often to have missing data due to reasons such as machine failure or human error. The approach used in dealing with such missing data can affect the results of the analysis. The main aim of this study was to review the types of missing mechanism, imputation methods, application of some of them in imputation of missing of PM10 and O3 in Tabriz, and compare their efficiency. Methods: Methods of mean, EM algorithm, regression, classification and regression tree, predictive mean matching (PMM), interpolation, moving average, and K-nearest neighbor (KNN) were used. PMM was investigated by considering the spatial and temporal dependencies in the model. Missing data were randomly simulated with 10, 20, and 30% missing values. The efficiency of methods was compared using coefficient of determination (R2 ), mean absolute error (MAE) and root mean square error (RMSE). Results: Based on the results for all indicators, interpolation, moving average, and KNN had the best performance, respectively. PMM did not perform well with and without spatio-temporal information. Conclusion: Given that the nature of pollution data always depends on next and previous information, methods that their computational nature is based on before and after information indicated better performance than others, so in the case of pollutant data, it is recommended to use these methods.\",\"PeriodicalId\":51877,\"journal\":{\"name\":\"Environmental Health Engineering and Management Journal\",\"volume\":\"95 1\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2021-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Health Engineering and Management Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34172/ehem.2021.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Health Engineering and Management Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34172/ehem.2021.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 11

摘要

背景:在空气质量研究中，由于机器故障或人为错误等原因，经常出现数据缺失的情况。处理这种缺失数据的方法会影响分析的结果。本研究的主要目的是综述PM10和O3缺失的类型、缺失的估算方法，以及其中一些方法在大不里士PM10和O3缺失估算中的应用，并比较它们的效率。方法:采用均值、EM算法、回归、分类与回归树、预测均值匹配(PMM)、插值、移动平均、k -最近邻(KNN)等方法。考虑了模型的时空依赖性，对PMM进行了研究。缺失数据随机模拟，缺失值分别为10%、20%和30%。采用决定系数(R2)、平均绝对误差(MAE)和均方根误差(RMSE)比较各方法的有效性。结果:综合各指标结果，插值法、移动平均法、KNN法表现最佳。在有无时空信息的情况下，PMM均表现不佳。结论:由于污染数据的性质总是依赖于下一个信息和前一个信息，因此基于前后信息计算性质的方法比其他方法性能更好，因此在污染物数据的情况下，建议使用这些方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods

Background: PIn air quality studies, it is very often to have missing data due to reasons such as machine failure or human error. The approach used in dealing with such missing data can affect the results of the analysis. The main aim of this study was to review the types of missing mechanism, imputation methods, application of some of them in imputation of missing of PM10 and O3 in Tabriz, and compare their efficiency. Methods: Methods of mean, EM algorithm, regression, classification and regression tree, predictive mean matching (PMM), interpolation, moving average, and K-nearest neighbor (KNN) were used. PMM was investigated by considering the spatial and temporal dependencies in the model. Missing data were randomly simulated with 10, 20, and 30% missing values. The efficiency of methods was compared using coefficient of determination (R2 ), mean absolute error (MAE) and root mean square error (RMSE). Results: Based on the results for all indicators, interpolation, moving average, and KNN had the best performance, respectively. PMM did not perform well with and without spatio-temporal information. Conclusion: Given that the nature of pollution data always depends on next and previous information, methods that their computational nature is based on before and after information indicated better performance than others, so in the case of pollutant data, it is recommended to use these methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Environmental Health Engineering and Management Journal ENVIRONMENTAL SCIENCES-

CiteScore

2.40

自引率

37.50%

发文量

审稿时长

12 weeks