TsRss: A Practical Stream data Cleaning Method based on Local Shape Feature

Changyong Yu, Peng Liu, Yudi Liu, Haitao Ma, Yuhai Zhao
{"title":"TsRss: A Practical Stream data Cleaning Method based on Local Shape Feature","authors":"Changyong Yu, Peng Liu, Yudi Liu, Haitao Ma, Yuhai Zhao","doi":"10.1109/CACML55074.2022.00137","DOIUrl":null,"url":null,"abstract":"Stream data which is common usually suffers from dirty data points due to noise interference, unreliable sensor reading, erroneous extraction of stock prices or other various reasons. Existing smoothing filter based data cleaning methods seriously alter the data without preserving the original information. And the others such as SCREEN need to be guided by some semantic constraints in specific application scenarios. To improve the usability, we propose a method called TsRss, which is a practical stream data cleaning method based on local shape feature (Shape-Sheet). TsRss is based on the basic idea that data points failing to match its local shape features are more likely to be dirty. To this end, we first study the methods of generating and representing unequal-length Shape-Sheets based on the local shape features. Then the method for finding dirty data via anomaly detection is proposed based on Shape-Sheet. Finally, experiments were conducted on several real datasets. The result showed that TsRss was more practical in use on various types of data, more accurate or more time-saving compared with state-of-the-art methods.","PeriodicalId":137505,"journal":{"name":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CACML55074.2022.00137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stream data which is common usually suffers from dirty data points due to noise interference, unreliable sensor reading, erroneous extraction of stock prices or other various reasons. Existing smoothing filter based data cleaning methods seriously alter the data without preserving the original information. And the others such as SCREEN need to be guided by some semantic constraints in specific application scenarios. To improve the usability, we propose a method called TsRss, which is a practical stream data cleaning method based on local shape feature (Shape-Sheet). TsRss is based on the basic idea that data points failing to match its local shape features are more likely to be dirty. To this end, we first study the methods of generating and representing unequal-length Shape-Sheets based on the local shape features. Then the method for finding dirty data via anomaly detection is proposed based on Shape-Sheet. Finally, experiments were conducted on several real datasets. The result showed that TsRss was more practical in use on various types of data, more accurate or more time-saving compared with state-of-the-art methods.
一种实用的基于局部形状特征的流数据清洗方法
由于噪声干扰、传感器读数不可靠、股票价格提取错误或其他各种原因,常见的流数据通常会受到脏数据点的影响。现有的基于平滑滤波的数据清洗方法严重地改变了数据,而没有保留原始信息。而其他的(如SCREEN)则需要在特定的应用场景中受到一些语义约束的指导。为了提高可用性,我们提出了一种实用的基于局部形状特征(shape - sheet)的流数据清洗方法TsRss。TsRss基于的基本思想是,与局部形状特征不匹配的数据点更有可能是脏的。为此,我们首先研究了基于局部形状特征的不等长形状片的生成和表示方法。在此基础上,提出了基于Shape-Sheet的脏数据异常检测方法。最后,在多个真实数据集上进行了实验。结果表明,与最先进的方法相比,tsrs在各种类型的数据上使用更实用,更准确或更节省时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信