具有参考完整性错误的数据库中聚合的估计和边界

International Workshop on Data Warehousing and OLAP Pub Date : 2008-10-30 DOI:10.1145/1458432.1458442

Javier García-García, C. Ordonez

{"title":"具有参考完整性错误的数据库中聚合的估计和边界","authors":"Javier García-García, C. Ordonez","doi":"10.1145/1458432.1458442","DOIUrl":null,"url":null,"abstract":"Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined foreign key value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.","PeriodicalId":335396,"journal":{"name":"International Workshop on Data Warehousing and OLAP","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Estimating and bounding aggregations in databases with referential integrity errors\",\"authors\":\"Javier García-García, C. Ordonez\",\"doi\":\"10.1145/1458432.1458442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined foreign key value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.\",\"PeriodicalId\":335396,\"journal\":{\"name\":\"International Workshop on Data Warehousing and OLAP\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Data Warehousing and OLAP\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1458432.1458442\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Warehousing and OLAP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1458432.1458442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

通过创建所有这些数据的单一视图，数据库集成构建在来自多个数据库的表上。每个数据库都有不同的表、跨数据库具有相似内容的列和不同的引用完整性约束。因此，集成数据库中的查询可能涉及具有引用完整性错误的表和列。在数据仓库环境中，尽管ETL流程负责处理引用完整性错误，但在许多场景中，这通常是通过在维度表中包含“虚拟”记录来完成的，维度表用于与存在引用错误的事实表相关联。当连接两个表并计算聚合时，具有未定义外键值的元组将聚合到标记为未定义的组中，从而有效地丢弃可能有价值的信息。考虑到这个动机，我们扩展了在OLAP数据库上有引用完整性错误的表上计算的聚合函数，以返回不排除元组的完整答案集。我们将无效引用实际上是某个正确引用的概率与每个有效引用联系起来。我们工作的主要思想是，在某些上下文中，通过考虑无效引用实际上是某个正确引用的概率，可以使用具有无效引用的元组。通过这种方式，可以从数据库违反引用完整性约束的设置中的聚合查询获得改进的答案集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Estimating and bounding aggregations in databases with referential integrity errors

Database integration builds on tables coming from multiple databases by creating a single view of all these data. Each database has different tables, columns with similar content across databases and different referential integrity constraints. Thus, a query in an integrated database is likely to involve tables and columns with referential integrity errors. In a data warehouse environment, even though the ETL processes take care of the referential integrity errors, in many scenarios this is generally done by including 'dummy' records in the dimension tables used to relate to the fact tables with referential errors. When two tables are joined, and aggregations are computed, the tuples with an undefined foreign key value are aggregated in a group marked as undefined effectively discarding potentially valuable information. With that motivation in mind, we extend aggregate functions computed over tables with referential integrity errors on OLAP databases to return complete answer sets in the sense that no tuple is excluded. We associate to each valid reference, the probability that an invalid reference may actually be a certain correct reference. The main idea of our work is that in certain contexts, it is possible to use tuples with invalid references by taking into account the probability that an invalid reference actually be a certain correct reference. This way, improved answer sets are obtained from aggregate queries in settings where a database violates referential integrity constraints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on Data Warehousing and OLAP

自引率

0.00%

发文量