Approximate answers to OLAP queries on streaming data warehouses

M. D. Rougemont, Phuong Thao Cao
{"title":"Approximate answers to OLAP queries on streaming data warehouses","authors":"M. D. Rougemont, Phuong Thao Cao","doi":"10.1145/2390045.2390065","DOIUrl":null,"url":null,"abstract":"We study streaming data for a data warehouse, which combines different sources. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first study how to sample each source and combine the samples to approximate any OLAP query. We then consider a streaming context, where a data warehouse is built by streams of different sources. We first show a lower bound on the size of the memory necessary to approximate queries and then consider a statistical hypothesis where some attributes determine fixed distributions of the measure. We use the sampling methods to learn the statistical model and approximate OLAP queries. In this case, we approximate OLAP queries with a finite memory. We apply the method to a dataset which simulates the data of sensors, which provide weather parameters over time and locations from different sources.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Warehousing and OLAP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390045.2390065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

We study streaming data for a data warehouse, which combines different sources. We consider the relative answers to OLAP queries on a schema, as distributions with the L1 distance and approximate the answers without storing the entire data warehouse. We first study how to sample each source and combine the samples to approximate any OLAP query. We then consider a streaming context, where a data warehouse is built by streams of different sources. We first show a lower bound on the size of the memory necessary to approximate queries and then consider a statistical hypothesis where some attributes determine fixed distributions of the measure. We use the sampling methods to learn the statistical model and approximate OLAP queries. In this case, we approximate OLAP queries with a finite memory. We apply the method to a dataset which simulates the data of sensors, which provide weather parameters over time and locations from different sources.
流数据仓库上的OLAP查询的近似答案
我们研究数据仓库的流数据,它结合了不同的数据源。我们将模式上OLAP查询的相对答案视为具有L1距离的分布,并在不存储整个数据仓库的情况下近似计算答案。我们首先研究如何对每个源进行采样,并将这些样本组合起来以近似任何OLAP查询。然后我们考虑一个流上下文,其中数据仓库是由不同来源的流构建的。我们首先给出近似查询所需的内存大小的下界,然后考虑一个统计假设,其中一些属性决定了度量的固定分布。我们使用采样方法来学习统计模型和近似OLAP查询。在这种情况下,我们使用有限的内存近似处理OLAP查询。我们将该方法应用于模拟传感器数据的数据集,这些数据集提供了来自不同来源的随时间和地点的天气参数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信