Detecting spatio-temporal outliers with kernels and statistical testing

James P. Rogers, Daniel Barbará, C. Domeniconi
{"title":"Detecting spatio-temporal outliers with kernels and statistical testing","authors":"James P. Rogers, Daniel Barbará, C. Domeniconi","doi":"10.1109/GEOINFORMATICS.2009.5293481","DOIUrl":null,"url":null,"abstract":"Outlier detection is the discovery of points that are exceptional when compared with a set of observations that are considered normal. Such points are important since they often lead to the discovery of exceptional events. In spatio-temporal data, observations are vectors of feature values, tagged with a geographical location and a timestamp. A spatio-temporal outlier is an observation whose attribute values are significantly different from those of other spatially and temporally referenced objects in a spatio-temporal neighborhood. It represents an object that is significantly different from its neighbors, even though it may not be significantly different from the entire population. The discovery of outliers in spatio-temporal data is then complicated by the fact that one needs to focus the search on appropriate spatio-temporal neighborhoods of points. The work in this paper leverages an algorithm, StrOUD (Strangeness-based Outlier Detection algorithm), that has been developed and used by the authors to detect outliers in various scenarios (including vector spaces and non-vectorial data). StrOUD uses a measure of strangeness to categorize an observation, and compares the strangeness of a point with the distribution of strangeness of a set of baseline observations (which are assumed to be mostly from normal points). Using statistical testing, StrOUD determines if the point is an outlier or not. The technique described in this paper defines strangeness as the sum of distances to nearest neighbors, where the distance between two observations is computed as a weighted combination of the distance between their vectors of features, their geographical distance, and their temporal distance. Using this multi-modal distance measure (thereby called kernel), our technique is able to diagnose outliers with respect to spatio-temporal neighborhoods. We show how our approach is capable of determining outliers in real-life data, including crime data, and a set of observations collected by buoys in the Gulf of Mexico during the 2005 hurricane season. We show that the use of different weightings on the kernel distances allows the user to adapt the size of spatio-temporal neighborhoods.","PeriodicalId":121212,"journal":{"name":"2009 17th International Conference on Geoinformatics","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 17th International Conference on Geoinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GEOINFORMATICS.2009.5293481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Outlier detection is the discovery of points that are exceptional when compared with a set of observations that are considered normal. Such points are important since they often lead to the discovery of exceptional events. In spatio-temporal data, observations are vectors of feature values, tagged with a geographical location and a timestamp. A spatio-temporal outlier is an observation whose attribute values are significantly different from those of other spatially and temporally referenced objects in a spatio-temporal neighborhood. It represents an object that is significantly different from its neighbors, even though it may not be significantly different from the entire population. The discovery of outliers in spatio-temporal data is then complicated by the fact that one needs to focus the search on appropriate spatio-temporal neighborhoods of points. The work in this paper leverages an algorithm, StrOUD (Strangeness-based Outlier Detection algorithm), that has been developed and used by the authors to detect outliers in various scenarios (including vector spaces and non-vectorial data). StrOUD uses a measure of strangeness to categorize an observation, and compares the strangeness of a point with the distribution of strangeness of a set of baseline observations (which are assumed to be mostly from normal points). Using statistical testing, StrOUD determines if the point is an outlier or not. The technique described in this paper defines strangeness as the sum of distances to nearest neighbors, where the distance between two observations is computed as a weighted combination of the distance between their vectors of features, their geographical distance, and their temporal distance. Using this multi-modal distance measure (thereby called kernel), our technique is able to diagnose outliers with respect to spatio-temporal neighborhoods. We show how our approach is capable of determining outliers in real-life data, including crime data, and a set of observations collected by buoys in the Gulf of Mexico during the 2005 hurricane season. We show that the use of different weightings on the kernel distances allows the user to adapt the size of spatio-temporal neighborhoods.
利用核函数和统计检验检测时空异常值
异常点检测是指与一组被认为是正常的观察结果相比,发现异常点。这些点很重要,因为它们经常导致发现异常事件。在时空数据中,观测值是带有地理位置和时间戳标记的特征值向量。时空离群点是指在一个时空邻域中,其属性值与其他时空参考对象的属性值存在显著差异的观测值。它表示一个物体与其相邻的物体显著不同,即使它可能与整个群体没有显著不同。由于需要将搜索重点放在点的适当时空邻域上,因此发现时空数据中的异常值变得复杂。本文的工作利用了一种算法,StrOUD(基于陌生性的离群值检测算法),该算法已被作者开发并用于检测各种场景(包括向量空间和非向量数据)中的离群值。StrOUD使用陌生度度量来对观测进行分类,并将一个点的陌生度与一组基线观测值的陌生度分布进行比较(假设这些点主要来自正态点)。通过统计检验,StrOUD确定该点是否为异常值。本文描述的技术将陌生度定义为到最近邻居的距离之和,其中两个观测值之间的距离是其特征向量,地理距离和时间距离之间距离的加权组合。使用这种多模态距离度量(因此称为核),我们的技术能够根据时空邻域诊断异常值。我们展示了我们的方法如何能够确定现实生活数据中的异常值,包括犯罪数据,以及2005年飓风季节期间墨西哥湾浮标收集的一组观测数据。我们表明,在核距离上使用不同的权重允许用户适应时空邻域的大小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信