Dimitri Leggas, Thomas Henretty, J. Ezick, M. Baskaran, Brendan von Hofe, Grace H. Cimaszewski, Harper Langston, R. Lethin
{"title":"Multiscale Data Analysis Using Binning, Tensor Decompositions, and Backtracking","authors":"Dimitri Leggas, Thomas Henretty, J. Ezick, M. Baskaran, Brendan von Hofe, Grace H. Cimaszewski, Harper Langston, R. Lethin","doi":"10.1109/HPEC43674.2020.9286171","DOIUrl":null,"url":null,"abstract":"Large data sets can contain patterns at multiple scales (spatial, temporal, etc.). In practice, it is useful for data exploration techniques to detect patterns at each relevant scale. In this paper, we develop an approach to detect activities at multiple scales using tensor decomposition, an unsupervised high-dimensional data analysis technique that finds correlations between different features in the data. This method typically requires that the feature values are discretized during the construction of the tensor in a process called “binning.” We develop a method of constructing and decomposing tensors with different binning schemes of various features in order to uncover patterns across a set of user-defined scales. While binning is necessary to obtain interpretable results from tensor decompositions, it also decreases the specificity of the data. Thus, we develop backtracking methods that enable the recovery of original source data corresponding to patterns found in the decomposition. These technique are discussed in the context of spatiotemporal and network traffic data, and in particular on Automatic Identification System (AIS) data.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Large data sets can contain patterns at multiple scales (spatial, temporal, etc.). In practice, it is useful for data exploration techniques to detect patterns at each relevant scale. In this paper, we develop an approach to detect activities at multiple scales using tensor decomposition, an unsupervised high-dimensional data analysis technique that finds correlations between different features in the data. This method typically requires that the feature values are discretized during the construction of the tensor in a process called “binning.” We develop a method of constructing and decomposing tensors with different binning schemes of various features in order to uncover patterns across a set of user-defined scales. While binning is necessary to obtain interpretable results from tensor decompositions, it also decreases the specificity of the data. Thus, we develop backtracking methods that enable the recovery of original source data corresponding to patterns found in the decomposition. These technique are discussed in the context of spatiotemporal and network traffic data, and in particular on Automatic Identification System (AIS) data.