经典方法与机器学习方法在臭氧日浓度时空模拟中的比较

2020 XLVI Latin American Computing Conference (CLEI) Pub Date : 2020-10-01 DOI:10.1109/CLEI52000.2020.00014

R. Gualán, Víctor Saquicela, Long Tran-Thanh

{"title":"经典方法与机器学习方法在臭氧日浓度时空模拟中的比较","authors":"R. Gualán, Víctor Saquicela, Long Tran-Thanh","doi":"10.1109/CLEI52000.2020.00014","DOIUrl":null,"url":null,"abstract":"Effective actions to mitigate air pollution require of availability of high-resolution observations. Low-cost sensor technologies have emerged as an affordable solution to cope with this deficiency. However, since low-cost sensors are built with low-cost materials, they are prone to errors, gaps, bias, and noise. These problems need to be solved before data can be used to support research or decision making. Addressing lack of reliability in low-cost sensor data is a complex challenge that is still under research over several lines (e.g. accuracy estimation of low-cost sensor data). Current approaches in this line involve modeling, bias-correction, and more recently, data fusion methods relying on high-resolution air quality computational models. Overall, accuracy estimation can be reduced to a modeling problem. The focus of this work is studying, testing, and comparing suitable approaches for handling point-referenced spatio-temporal sensor data, particularly classical spatial models, spatio-temporal models, and popular machine learning methods. Among these approaches, Bayesian hierarchical models have a special consideration given the attention they have drawn during the last fifteen years. The benchmark supporting this comparison study is a real-life dataset made up of daily ozone observations taken from the USA Environmental Protection Agency (EPA) and meteorological variables extracted from the NCEP/NCAR Reanalysis Project (NNRP). The main contributions of this work are: (1) a systematic comparison of three kinds of models, using a 10-fold cross-validation exercise; and (2) a feature engineering method to create covariates meant to harness spatially correlated observations of point-referenced sensor data.","PeriodicalId":413655,"journal":{"name":"2020 XLVI Latin American Computing Conference (CLEI)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of classical and machine-learning methods on spatio-temporal modeling of daily Ozone concentrations\",\"authors\":\"R. Gualán, Víctor Saquicela, Long Tran-Thanh\",\"doi\":\"10.1109/CLEI52000.2020.00014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Effective actions to mitigate air pollution require of availability of high-resolution observations. Low-cost sensor technologies have emerged as an affordable solution to cope with this deficiency. However, since low-cost sensors are built with low-cost materials, they are prone to errors, gaps, bias, and noise. These problems need to be solved before data can be used to support research or decision making. Addressing lack of reliability in low-cost sensor data is a complex challenge that is still under research over several lines (e.g. accuracy estimation of low-cost sensor data). Current approaches in this line involve modeling, bias-correction, and more recently, data fusion methods relying on high-resolution air quality computational models. Overall, accuracy estimation can be reduced to a modeling problem. The focus of this work is studying, testing, and comparing suitable approaches for handling point-referenced spatio-temporal sensor data, particularly classical spatial models, spatio-temporal models, and popular machine learning methods. Among these approaches, Bayesian hierarchical models have a special consideration given the attention they have drawn during the last fifteen years. The benchmark supporting this comparison study is a real-life dataset made up of daily ozone observations taken from the USA Environmental Protection Agency (EPA) and meteorological variables extracted from the NCEP/NCAR Reanalysis Project (NNRP). The main contributions of this work are: (1) a systematic comparison of three kinds of models, using a 10-fold cross-validation exercise; and (2) a feature engineering method to create covariates meant to harness spatially correlated observations of point-referenced sensor data.\",\"PeriodicalId\":413655,\"journal\":{\"name\":\"2020 XLVI Latin American Computing Conference (CLEI)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 XLVI Latin American Computing Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI52000.2020.00014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 XLVI Latin American Computing Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI52000.2020.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

减轻空气污染的有效行动需要高分辨率观测资料的可用性。低成本传感器技术已经成为解决这一缺陷的一种经济可行的解决方案。然而，由于低成本传感器是用低成本材料制造的，它们容易出现误差、间隙、偏差和噪音。在数据可以用于支持研究或决策之前，这些问题需要得到解决。解决低成本传感器数据缺乏可靠性的问题是一个复杂的挑战，目前仍在多个领域进行研究(例如，低成本传感器数据的精度估计)。目前这方面的方法包括建模、偏差校正，以及最近基于高分辨率空气质量计算模型的数据融合方法。总的来说，精度估计可以简化为一个建模问题。这项工作的重点是研究、测试和比较处理点参考时空传感器数据的合适方法，特别是经典的空间模型、时空模型和流行的机器学习方法。在这些方法中，贝叶斯层次模型在过去十五年中受到了特别的关注。支持这项比较研究的基准是一个真实的数据集，该数据集由美国环境保护署(EPA)的每日臭氧观测数据和NCEP/NCAR再分析项目(NNRP)提取的气象变量组成。这项工作的主要贡献是:(1)使用10倍交叉验证练习对三种模型进行系统比较;(2)利用特征工程方法创建协变量，以利用点参考传感器数据的空间相关观测值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of classical and machine-learning methods on spatio-temporal modeling of daily Ozone concentrations

Effective actions to mitigate air pollution require of availability of high-resolution observations. Low-cost sensor technologies have emerged as an affordable solution to cope with this deficiency. However, since low-cost sensors are built with low-cost materials, they are prone to errors, gaps, bias, and noise. These problems need to be solved before data can be used to support research or decision making. Addressing lack of reliability in low-cost sensor data is a complex challenge that is still under research over several lines (e.g. accuracy estimation of low-cost sensor data). Current approaches in this line involve modeling, bias-correction, and more recently, data fusion methods relying on high-resolution air quality computational models. Overall, accuracy estimation can be reduced to a modeling problem. The focus of this work is studying, testing, and comparing suitable approaches for handling point-referenced spatio-temporal sensor data, particularly classical spatial models, spatio-temporal models, and popular machine learning methods. Among these approaches, Bayesian hierarchical models have a special consideration given the attention they have drawn during the last fifteen years. The benchmark supporting this comparison study is a real-life dataset made up of daily ozone observations taken from the USA Environmental Protection Agency (EPA) and meteorological variables extracted from the NCEP/NCAR Reanalysis Project (NNRP). The main contributions of this work are: (1) a systematic comparison of three kinds of models, using a 10-fold cross-validation exercise; and (2) a feature engineering method to create covariates meant to harness spatially correlated observations of point-referenced sensor data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 XLVI Latin American Computing Conference (CLEI)

自引率

0.00%

发文量