{"title":"GPSClean:一个清理和修复GPS数据的框架","authors":"Cheng-Hung Fang, Feng Wang, Bin Yao, Jianqiu Xu","doi":"10.1145/3469088","DOIUrl":null,"url":null,"abstract":"The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. The collected raw data usually contain errors and anomalies information caused by device failure, sensor error, and environment influence. Low-quality data fails to support application requirements and therefore raw data will be comprehensively cleaned before usage. Existing methods are suboptimal to detect GPS data errors and do the repairing. To solve the problem, we propose a framework called GPSClean to analyze the anomalies data and develop effective methods to repair the data. There are primarily four modules in GPSClean: (i) data preprocessing, (ii) data filling, (iii) data repairing, and (iv) data conversion. For (i), we propose an approach named MDSort (Maximum Disorder Sorting) to efficiently solve the issue of data disorder. For (ii), we propose a method named NNF (Nearest Neighbor Filling) to fill missing data. For (iii), we design an approach named RCSWS (Range Constraints and Sliding Window Statistics) to repair anomalies and also improve the accuracy of data repairing by mak7ing use of driving direction. We use 45 million real trajectory data to evaluate our proposal in a prototype database system SECONDO. Experimental results show that the accuracy of RCSWS is three times higher than an alternative method SCREEN and nearly an order of magnitude higher than an alternative method EWMA.","PeriodicalId":123526,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology (TIST)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"GPSClean: A Framework for Cleaning and Repairing GPS Data\",\"authors\":\"Cheng-Hung Fang, Feng Wang, Bin Yao, Jianqiu Xu\",\"doi\":\"10.1145/3469088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. The collected raw data usually contain errors and anomalies information caused by device failure, sensor error, and environment influence. Low-quality data fails to support application requirements and therefore raw data will be comprehensively cleaned before usage. Existing methods are suboptimal to detect GPS data errors and do the repairing. To solve the problem, we propose a framework called GPSClean to analyze the anomalies data and develop effective methods to repair the data. There are primarily four modules in GPSClean: (i) data preprocessing, (ii) data filling, (iii) data repairing, and (iv) data conversion. For (i), we propose an approach named MDSort (Maximum Disorder Sorting) to efficiently solve the issue of data disorder. For (ii), we propose a method named NNF (Nearest Neighbor Filling) to fill missing data. For (iii), we design an approach named RCSWS (Range Constraints and Sliding Window Statistics) to repair anomalies and also improve the accuracy of data repairing by mak7ing use of driving direction. We use 45 million real trajectory data to evaluate our proposal in a prototype database system SECONDO. Experimental results show that the accuracy of RCSWS is three times higher than an alternative method SCREEN and nearly an order of magnitude higher than an alternative method EWMA.\",\"PeriodicalId\":123526,\"journal\":{\"name\":\"ACM Transactions on Intelligent Systems and Technology (TIST)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Intelligent Systems and Technology (TIST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3469088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology (TIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
配备gps的移动设备的兴起导致了大轨迹数据的出现。采集的原始数据通常包含由设备故障、传感器错误、环境影响等原因引起的错误和异常信息。低质量的数据无法支持应用程序的需求,因此在使用前将对原始数据进行全面清理。现有方法对GPS数据误差的检测和修复效果不理想。为了解决这个问题,我们提出了一个名为GPSClean的框架来分析异常数据,并制定有效的方法来修复数据。GPSClean主要有四个模块:(i)数据预处理,(ii)数据填充,(iii)数据修复,(iv)数据转换。对于(i),我们提出了一种名为MDSort (Maximum Disorder Sorting,最大无序排序)的方法来有效地解决数据无序问题。对于(ii),我们提出了一种名为NNF(最近邻填充)的方法来填充缺失的数据。对于(iii),我们设计了一种名为RCSWS (Range Constraints and Sliding Window Statistics)的方法来修复异常,并利用驾驶方向来提高数据修复的准确性。我们在一个原型数据库系统SECONDO中使用了4500万个真实轨迹数据来评估我们的提案。实验结果表明,RCSWS的精度比SCREEN方法提高了3倍,比EWMA方法提高了近一个数量级。
GPSClean: A Framework for Cleaning and Repairing GPS Data
The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. The collected raw data usually contain errors and anomalies information caused by device failure, sensor error, and environment influence. Low-quality data fails to support application requirements and therefore raw data will be comprehensively cleaned before usage. Existing methods are suboptimal to detect GPS data errors and do the repairing. To solve the problem, we propose a framework called GPSClean to analyze the anomalies data and develop effective methods to repair the data. There are primarily four modules in GPSClean: (i) data preprocessing, (ii) data filling, (iii) data repairing, and (iv) data conversion. For (i), we propose an approach named MDSort (Maximum Disorder Sorting) to efficiently solve the issue of data disorder. For (ii), we propose a method named NNF (Nearest Neighbor Filling) to fill missing data. For (iii), we design an approach named RCSWS (Range Constraints and Sliding Window Statistics) to repair anomalies and also improve the accuracy of data repairing by mak7ing use of driving direction. We use 45 million real trajectory data to evaluate our proposal in a prototype database system SECONDO. Experimental results show that the accuracy of RCSWS is three times higher than an alternative method SCREEN and nearly an order of magnitude higher than an alternative method EWMA.