{"title":"GPSClean: A Framework for Cleaning and Repairing GPS Data","authors":"Cheng-Hung Fang, Feng Wang, Bin Yao, Jianqiu Xu","doi":"10.1145/3469088","DOIUrl":null,"url":null,"abstract":"The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. The collected raw data usually contain errors and anomalies information caused by device failure, sensor error, and environment influence. Low-quality data fails to support application requirements and therefore raw data will be comprehensively cleaned before usage. Existing methods are suboptimal to detect GPS data errors and do the repairing. To solve the problem, we propose a framework called GPSClean to analyze the anomalies data and develop effective methods to repair the data. There are primarily four modules in GPSClean: (i) data preprocessing, (ii) data filling, (iii) data repairing, and (iv) data conversion. For (i), we propose an approach named MDSort (Maximum Disorder Sorting) to efficiently solve the issue of data disorder. For (ii), we propose a method named NNF (Nearest Neighbor Filling) to fill missing data. For (iii), we design an approach named RCSWS (Range Constraints and Sliding Window Statistics) to repair anomalies and also improve the accuracy of data repairing by mak7ing use of driving direction. We use 45 million real trajectory data to evaluate our proposal in a prototype database system SECONDO. Experimental results show that the accuracy of RCSWS is three times higher than an alternative method SCREEN and nearly an order of magnitude higher than an alternative method EWMA.","PeriodicalId":123526,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology (TIST)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology (TIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. The collected raw data usually contain errors and anomalies information caused by device failure, sensor error, and environment influence. Low-quality data fails to support application requirements and therefore raw data will be comprehensively cleaned before usage. Existing methods are suboptimal to detect GPS data errors and do the repairing. To solve the problem, we propose a framework called GPSClean to analyze the anomalies data and develop effective methods to repair the data. There are primarily four modules in GPSClean: (i) data preprocessing, (ii) data filling, (iii) data repairing, and (iv) data conversion. For (i), we propose an approach named MDSort (Maximum Disorder Sorting) to efficiently solve the issue of data disorder. For (ii), we propose a method named NNF (Nearest Neighbor Filling) to fill missing data. For (iii), we design an approach named RCSWS (Range Constraints and Sliding Window Statistics) to repair anomalies and also improve the accuracy of data repairing by mak7ing use of driving direction. We use 45 million real trajectory data to evaluate our proposal in a prototype database system SECONDO. Experimental results show that the accuracy of RCSWS is three times higher than an alternative method SCREEN and nearly an order of magnitude higher than an alternative method EWMA.