Scalable Data Parallel Approaches to Anomaly Detection in Climate Data using Gaussian Processes

K. Gadiraju, B. Ramachandra, Ashwin Shashidharan, B. Dutton, Ranga Raju Vatsavai
{"title":"Scalable Data Parallel Approaches to Anomaly Detection in Climate Data using Gaussian Processes","authors":"K. Gadiraju, B. Ramachandra, Ashwin Shashidharan, B. Dutton, Ranga Raju Vatsavai","doi":"10.1109/ICMLA.2019.00090","DOIUrl":null,"url":null,"abstract":"Anomaly detection on large scale spatio-temporal data such as climate data is a challenging task depending on the spatial and temporal resolution and autocorrelation of the data. When considering global gridded daily temperature data, the number of locations and the length of time period considered makes anomaly detection a big data problem. Gaussian Process (GP) Learning is a method that is well-suited to identify the complex spatial and temporal autocorrelation properties of spatio-temporal data. One of the primary challenges with using GP is the computational complexity associated with inverting a covariance matrix. This is further compounded when considering data on a national/global scale and performing anomaly detection using such methods often requires dedicated high performance computing platforms. In this paper, we describe a purely temporal scalable anomaly detection technique for gridded temperature data based on GP Learning that ignore the spatial autocorrelation between neighboring grids and perform anomaly detection on each of the grids in parallel, thereby reducing the execution time. We introduce three methods: a standalone data parallel approach using a single GPU, a distributed memory version on multi-node clusters using MPI, and a mixed parallel approach using multiple GPUs. In comparison to a sequential approach, they are 17.2x, 47.1x, and 88.9x faster, respectively.","PeriodicalId":436714,"journal":{"name":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2019.00090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Anomaly detection on large scale spatio-temporal data such as climate data is a challenging task depending on the spatial and temporal resolution and autocorrelation of the data. When considering global gridded daily temperature data, the number of locations and the length of time period considered makes anomaly detection a big data problem. Gaussian Process (GP) Learning is a method that is well-suited to identify the complex spatial and temporal autocorrelation properties of spatio-temporal data. One of the primary challenges with using GP is the computational complexity associated with inverting a covariance matrix. This is further compounded when considering data on a national/global scale and performing anomaly detection using such methods often requires dedicated high performance computing platforms. In this paper, we describe a purely temporal scalable anomaly detection technique for gridded temperature data based on GP Learning that ignore the spatial autocorrelation between neighboring grids and perform anomaly detection on each of the grids in parallel, thereby reducing the execution time. We introduce three methods: a standalone data parallel approach using a single GPU, a distributed memory version on multi-node clusters using MPI, and a mixed parallel approach using multiple GPUs. In comparison to a sequential approach, they are 17.2x, 47.1x, and 88.9x faster, respectively.
基于高斯过程的气候数据异常检测的可扩展数据并行方法
气候数据等大尺度时空数据的异常检测是一项具有挑战性的任务,这取决于数据的时空分辨率和自相关性。在考虑全球网格化的日温度数据时,考虑的位置数量和时间段长度使异常检测成为一个大数据问题。高斯过程学习是一种非常适合于识别时空数据复杂时空自相关特性的方法。使用GP的主要挑战之一是与反协方差矩阵相关的计算复杂性。当考虑到国家/全球范围内的数据,并且使用此类方法执行异常检测通常需要专用的高性能计算平台时,情况就更加复杂了。在本文中,我们描述了一种基于GP学习的网格温度数据的纯时间可扩展异常检测技术,该技术忽略相邻网格之间的空间自相关性,并并行地对每个网格进行异常检测,从而减少了执行时间。我们介绍了三种方法:使用单个GPU的独立数据并行方法,使用MPI的多节点集群上的分布式内存版本,以及使用多个GPU的混合并行方法。与顺序方法相比,它们分别快了17.2倍、47.1倍和88.9倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信