GPSClean: A Framework for Cleaning and Repairing GPS Data

Cheng-Hung Fang, Feng Wang, Bin Yao, Jianqiu Xu
{"title":"GPSClean: A Framework for Cleaning and Repairing GPS Data","authors":"Cheng-Hung Fang, Feng Wang, Bin Yao, Jianqiu Xu","doi":"10.1145/3469088","DOIUrl":null,"url":null,"abstract":"The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. The collected raw data usually contain errors and anomalies information caused by device failure, sensor error, and environment influence. Low-quality data fails to support application requirements and therefore raw data will be comprehensively cleaned before usage. Existing methods are suboptimal to detect GPS data errors and do the repairing. To solve the problem, we propose a framework called GPSClean to analyze the anomalies data and develop effective methods to repair the data. There are primarily four modules in GPSClean: (i) data preprocessing, (ii) data filling, (iii) data repairing, and (iv) data conversion. For (i), we propose an approach named MDSort (Maximum Disorder Sorting) to efficiently solve the issue of data disorder. For (ii), we propose a method named NNF (Nearest Neighbor Filling) to fill missing data. For (iii), we design an approach named RCSWS (Range Constraints and Sliding Window Statistics) to repair anomalies and also improve the accuracy of data repairing by mak7ing use of driving direction. We use 45 million real trajectory data to evaluate our proposal in a prototype database system SECONDO. Experimental results show that the accuracy of RCSWS is three times higher than an alternative method SCREEN and nearly an order of magnitude higher than an alternative method EWMA.","PeriodicalId":123526,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology (TIST)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology (TIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The rise of GPS-equipped mobile devices has led to the emergence of big trajectory data. The collected raw data usually contain errors and anomalies information caused by device failure, sensor error, and environment influence. Low-quality data fails to support application requirements and therefore raw data will be comprehensively cleaned before usage. Existing methods are suboptimal to detect GPS data errors and do the repairing. To solve the problem, we propose a framework called GPSClean to analyze the anomalies data and develop effective methods to repair the data. There are primarily four modules in GPSClean: (i) data preprocessing, (ii) data filling, (iii) data repairing, and (iv) data conversion. For (i), we propose an approach named MDSort (Maximum Disorder Sorting) to efficiently solve the issue of data disorder. For (ii), we propose a method named NNF (Nearest Neighbor Filling) to fill missing data. For (iii), we design an approach named RCSWS (Range Constraints and Sliding Window Statistics) to repair anomalies and also improve the accuracy of data repairing by mak7ing use of driving direction. We use 45 million real trajectory data to evaluate our proposal in a prototype database system SECONDO. Experimental results show that the accuracy of RCSWS is three times higher than an alternative method SCREEN and nearly an order of magnitude higher than an alternative method EWMA.
GPSClean:一个清理和修复GPS数据的框架
配备gps的移动设备的兴起导致了大轨迹数据的出现。采集的原始数据通常包含由设备故障、传感器错误、环境影响等原因引起的错误和异常信息。低质量的数据无法支持应用程序的需求,因此在使用前将对原始数据进行全面清理。现有方法对GPS数据误差的检测和修复效果不理想。为了解决这个问题,我们提出了一个名为GPSClean的框架来分析异常数据,并制定有效的方法来修复数据。GPSClean主要有四个模块:(i)数据预处理,(ii)数据填充,(iii)数据修复,(iv)数据转换。对于(i),我们提出了一种名为MDSort (Maximum Disorder Sorting,最大无序排序)的方法来有效地解决数据无序问题。对于(ii),我们提出了一种名为NNF(最近邻填充)的方法来填充缺失的数据。对于(iii),我们设计了一种名为RCSWS (Range Constraints and Sliding Window Statistics)的方法来修复异常,并利用驾驶方向来提高数据修复的准确性。我们在一个原型数据库系统SECONDO中使用了4500万个真实轨迹数据来评估我们的提案。实验结果表明,RCSWS的精度比SCREEN方法提高了3倍,比EWMA方法提高了近一个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信