Value invention in data exchange

Proceedings. ACM-SIGMOD International Conference on Management of Data Pub Date : 2013-06-22 DOI:10.1145/2463676.2465311

Patricia C. Arocena, Boris Glavic, Renée J. Miller

{"title":"Value invention in data exchange","authors":"Patricia C. Arocena, Boris Glavic, Renée J. Miller","doi":"10.1145/2463676.2465311","DOIUrl":null,"url":null,"abstract":"The creation of values to represent incomplete information, often referred to as value invention, is central in data exchange. Within schema mappings, Skolem functions have long been used for value invention as they permit a precise representation of missing information. Recent work on a powerful mapping language called second-order tuple generating dependencies (SO tgds), has drawn attention to the fact that the use of arbitrary Skolem functions can have negative computational and programmatic properties in data exchange. In this paper, we present two techniques for understanding when the Skolem functions needed to represent the correct semantics of incomplete information are computationally well-behaved. Specifically, we consider when the Skolem functions in second-order (SO) mappings have a first-order (FO) semantics and are therefore programmatically and computationally more desirable for use in practice. Our first technique, linearization, significantly extends the Nash, Bernstein and Melnik unskolemization algorithm, by understanding when the sets of arguments of the Skolem functions in a mapping are related by set inclusion. We show that such a linear relationship leads to mappings that have FO semantics and are expressible in popular mapping languages including source-to-target tgds and nested tgds. Our second technique uses source semantics, specifically functional dependencies (including keys), to transform SO mappings into equivalent FO mappings. We show that our algorithms are applicable to a strictly larger class of mappings than previous approaches, but more importantly we present an extensive experimental evaluation that quantifies this difference (about 78% improvement) over an extensive schema mapping benchmark and illustrates the applicability of our results on real mappings.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":"586 ","pages":"157-168"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2463676.2465311","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM-SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2463676.2465311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

The creation of values to represent incomplete information, often referred to as value invention, is central in data exchange. Within schema mappings, Skolem functions have long been used for value invention as they permit a precise representation of missing information. Recent work on a powerful mapping language called second-order tuple generating dependencies (SO tgds), has drawn attention to the fact that the use of arbitrary Skolem functions can have negative computational and programmatic properties in data exchange. In this paper, we present two techniques for understanding when the Skolem functions needed to represent the correct semantics of incomplete information are computationally well-behaved. Specifically, we consider when the Skolem functions in second-order (SO) mappings have a first-order (FO) semantics and are therefore programmatically and computationally more desirable for use in practice. Our first technique, linearization, significantly extends the Nash, Bernstein and Melnik unskolemization algorithm, by understanding when the sets of arguments of the Skolem functions in a mapping are related by set inclusion. We show that such a linear relationship leads to mappings that have FO semantics and are expressible in popular mapping languages including source-to-target tgds and nested tgds. Our second technique uses source semantics, specifically functional dependencies (including keys), to transform SO mappings into equivalent FO mappings. We show that our algorithms are applicable to a strictly larger class of mappings than previous approaches, but more importantly we present an extensive experimental evaluation that quantifies this difference (about 78% improvement) over an extensive schema mapping benchmark and illustrates the applicability of our results on real mappings.

查看原文本刊更多论文

数据交换中的价值创造

创造值来表示不完整的信息，通常被称为价值发明，是数据交换的核心。在模式映射中，Skolem函数长期以来一直用于值创建，因为它们允许对缺失的信息进行精确表示。最近对一种名为二阶元组生成依赖关系(SO tgds)的强大映射语言的研究引起了人们的注意，即在数据交换中使用任意Skolem函数可能具有负面的计算性和可编程性。在本文中，我们提出了两种技术，用于理解用于表示不完全信息的正确语义的Skolem函数何时在计算上表现良好。具体来说，我们考虑二阶(SO)映射中的Skolem函数何时具有一阶(FO)语义，从而在编程和计算上更适合在实践中使用。我们的第一种技术，线性化，通过理解映射中Skolem函数的参数集何时通过集合包含相关联，极大地扩展了Nash, Bernstein和Melnik非Skolem化算法。我们证明了这种线性关系导致具有FO语义的映射，并且可以用流行的映射语言表示，包括源到目标的tgds和嵌套的tgds。我们的第二种技术使用源语义，特别是功能依赖项(包括键)，将SO映射转换为等效的FO映射。我们证明了我们的算法比以前的方法适用于更大的映射类别，但更重要的是，我们提出了一个广泛的实验评估，量化了这种差异(大约78%的改进)，并说明了我们的结果在实际映射上的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. ACM-SIGMOD International Conference on Management of Data

自引率

0.00%

发文量