Abon Chaudhuri, Teng-Yok Lee, Han-Wei Shen, T. Peterka
{"title":"Efficient range distribution query in large-scale scientific data","authors":"Abon Chaudhuri, Teng-Yok Lee, Han-Wei Shen, T. Peterka","doi":"10.1109/LDAV.2013.6675171","DOIUrl":null,"url":null,"abstract":"Frequent access to raw data is no longer practical, if possible at all, for answering queries on large-scale data. This has led to the use of distribution-based data summaries, which can substitute for raw data to answer statistical queries of different kinds. Our work is concerned with range distribution query, which returns the distribution of an axis-aligned region of any size. We address the challenge of maintaining the interactivity and accuracy of such query results in the presence of large data. This work presents a novel and efficient framework for pre-computing and storing a set of distributions which can be used to query any arbitrary region during post-processing. We adapt an integral image based data structure to answer such queries in constant time, and propose a similarity-based encoding technique to reduce the storage cost of the data structure. Our scheme utilizes the similarity present among different regions in the data, and hence, their respective distributions. We demonstrate the use our technique in various applications, which directly or indirectly require distributions.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LDAV.2013.6675171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Frequent access to raw data is no longer practical, if possible at all, for answering queries on large-scale data. This has led to the use of distribution-based data summaries, which can substitute for raw data to answer statistical queries of different kinds. Our work is concerned with range distribution query, which returns the distribution of an axis-aligned region of any size. We address the challenge of maintaining the interactivity and accuracy of such query results in the presence of large data. This work presents a novel and efficient framework for pre-computing and storing a set of distributions which can be used to query any arbitrary region during post-processing. We adapt an integral image based data structure to answer such queries in constant time, and propose a similarity-based encoding technique to reduce the storage cost of the data structure. Our scheme utilizes the similarity present among different regions in the data, and hence, their respective distributions. We demonstrate the use our technique in various applications, which directly or indirectly require distributions.