Scalable spatial scan statistics through sampling

Michael Matheny, Raghvendra Singh, L. Zhang, Kaiqiang Wang, J. M. Phillips
{"title":"Scalable spatial scan statistics through sampling","authors":"Michael Matheny, Raghvendra Singh, L. Zhang, Kaiqiang Wang, J. M. Phillips","doi":"10.1145/2996913.2996939","DOIUrl":null,"url":null,"abstract":"Finding anomalous regions within spatial data sets is a central task for biosurveillance, homeland security, policy making, and many other important areas. These communities have mainly settled on spatial scan statistics as a rigorous way to discover regions where a measured quantity (e.g., crime) is statistically significant in its difference from a baseline population. However, most common approaches are inefficient and thus, can only be run with very modest data sizes (a few thousand data points) or make assumptions on the geographic distributions of the data. We address these challenges by designing, exploring, and analyzing sample-then-scan algorithms. These algorithms randomly sample data at two scales, one to define regions and the other to approximate the counts in these regions. Our experiments demonstrate that these algorithms are efficient and accurate independent of the size of the original data set, and our analysis explains why this is the case. For the first time, these sample-then-scan algorithms allow spatial scan statistics to run on a million or more data points without making assumptions on the spatial distribution of the data. Moreover, our experiments and analysis give insight into when it is appropriate to trust the various types of spatial anomalies when the data is modeled as a random sample from a larger but unknown data set.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"191 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2996913.2996939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Finding anomalous regions within spatial data sets is a central task for biosurveillance, homeland security, policy making, and many other important areas. These communities have mainly settled on spatial scan statistics as a rigorous way to discover regions where a measured quantity (e.g., crime) is statistically significant in its difference from a baseline population. However, most common approaches are inefficient and thus, can only be run with very modest data sizes (a few thousand data points) or make assumptions on the geographic distributions of the data. We address these challenges by designing, exploring, and analyzing sample-then-scan algorithms. These algorithms randomly sample data at two scales, one to define regions and the other to approximate the counts in these regions. Our experiments demonstrate that these algorithms are efficient and accurate independent of the size of the original data set, and our analysis explains why this is the case. For the first time, these sample-then-scan algorithms allow spatial scan statistics to run on a million or more data points without making assumptions on the spatial distribution of the data. Moreover, our experiments and analysis give insight into when it is appropriate to trust the various types of spatial anomalies when the data is modeled as a random sample from a larger but unknown data set.
可扩展的空间扫描统计通过采样
在空间数据集中发现异常区域是生物监测、国土安全、政策制定和许多其他重要领域的中心任务。这些社区主要采用空间扫描统计作为一种严格的方法,以发现测量数量(例如,犯罪)与基线人口的差异在统计上显着的区域。然而,大多数常见的方法效率低下,因此只能在非常有限的数据量(几千个数据点)下运行,或者对数据的地理分布进行假设。我们通过设计、探索和分析采样扫描算法来解决这些挑战。这些算法在两个尺度上随机采样数据,一个用于定义区域,另一个用于近似这些区域的计数。我们的实验表明,这些算法是高效和准确的,与原始数据集的大小无关,我们的分析解释了为什么会出现这种情况。这是第一次,这些采样-扫描算法允许在不假设数据的空间分布的情况下对一百万或更多的数据点运行空间扫描统计。此外,我们的实验和分析让我们深入了解,当数据作为一个更大但未知的数据集的随机样本建模时,何时应该相信各种类型的空间异常。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信