Controlling the Correctness of Aggregation Operations During Sessions of Interactive Analytic Queries

IF 1.5 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
E. Simon, B. Amann, Rutian Liu, Stéphane Gançarski
{"title":"Controlling the Correctness of Aggregation Operations During Sessions of Interactive Analytic Queries","authors":"E. Simon, B. Amann, Rutian Liu, Stéphane Gançarski","doi":"10.1145/3575812","DOIUrl":null,"url":null,"abstract":"We present a comprehensive set of conditions and rules to control the correctness of aggregation queries within an interactive data analysis session. The goal is to extend self-service data preparation and Business Intelligence (BI) tools to automatically detect semantically incorrect aggregate queries on analytic tables and views built by using the common analytic operations including filter, project, join, aggregate, union, difference, and pivot. We introduce aggregable properties to describe for any attribute of an analytic table, which aggregation functions correctly aggregate the attribute along which sets of dimension attributes. These properties can also be used to formally identify attributes that are summarizable with respect to some aggregation function along a given set of dimension attributes. This is particularly helpful to detect incorrect aggregations of measures obtained through the use of non-distributive aggregation functions like average and count. We extend the notion of summarizability by introducing a new generalized summarizability condition to control the aggregation of attributes after any analytic operation. Finally, we define propagation rules that transform aggregable properties of the query input tables into new aggregable properties for the result tables, preserving summarizability and generalized summarizability.","PeriodicalId":44355,"journal":{"name":"ACM Journal of Data and Information Quality","volume":"33 1","pages":"1 - 41"},"PeriodicalIF":1.5000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

We present a comprehensive set of conditions and rules to control the correctness of aggregation queries within an interactive data analysis session. The goal is to extend self-service data preparation and Business Intelligence (BI) tools to automatically detect semantically incorrect aggregate queries on analytic tables and views built by using the common analytic operations including filter, project, join, aggregate, union, difference, and pivot. We introduce aggregable properties to describe for any attribute of an analytic table, which aggregation functions correctly aggregate the attribute along which sets of dimension attributes. These properties can also be used to formally identify attributes that are summarizable with respect to some aggregation function along a given set of dimension attributes. This is particularly helpful to detect incorrect aggregations of measures obtained through the use of non-distributive aggregation functions like average and count. We extend the notion of summarizability by introducing a new generalized summarizability condition to control the aggregation of attributes after any analytic operation. Finally, we define propagation rules that transform aggregable properties of the query input tables into new aggregable properties for the result tables, preserving summarizability and generalized summarizability.
交互式分析查询会话期间聚合操作的正确性控制
我们提供了一组全面的条件和规则来控制交互式数据分析会话中聚合查询的正确性。目标是扩展自助服务数据准备和商业智能(BI)工具,以自动检测通过使用常见的分析操作(包括过滤器、项目、连接、聚合、联合、差异和pivot)构建的分析表和视图上语义不正确的聚合查询。我们引入可聚合属性来描述解析表的任何属性,哪些聚合函数沿着哪些维度属性集正确地聚合属性。这些属性还可用于形式化地标识属性,这些属性可根据给定维度属性集的某些聚合函数进行汇总。这对于检测通过使用非分布聚合函数(如average和count)获得的度量的不正确聚合特别有帮助。我们扩展了可归纳性的概念,引入了一个新的广义可归纳性条件来控制任意分析运算后属性的聚集。最后,我们定义了将查询输入表的可聚合属性转换为结果表的新可聚合属性的传播规则,从而保持了可聚合性和广义可聚合性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Journal of Data and Information Quality
ACM Journal of Data and Information Quality COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
4.10
自引率
4.80%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信