Similarity flooding: a versatile graph matching algorithm and its application to schema matching

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-02-26 DOI:10.1109/ICDE.2002.994702

S. Melnik, H. Garcia-Molina, E. Rahm

{"title":"Similarity flooding: a versatile graph matching algorithm and its application to schema matching","authors":"S. Melnik, H. Garcia-Molina, E. Rahm","doi":"10.1109/ICDE.2002.994702","DOIUrl":null,"url":null,"abstract":"Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the 'accuracy' of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1641","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 18th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2002.994702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1641

Abstract

Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the 'accuracy' of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.

查看原文本刊更多论文

相似泛洪:一种通用的图匹配算法及其在模式匹配中的应用

匹配两个数据模式或两个数据实例的元素在数据仓库、电子商务甚至生化应用程序中起着关键作用。本文提出了一种基于不动点计算的匹配算法，该算法可用于不同的场景。该算法将两个图(模式、目录或其他数据结构)作为输入，并在图的相应节点之间生成映射作为输出。根据匹配目标，使用过滤器选择映射的子集。在我们的算法运行之后，我们希望有人来检查并在必要时调整结果。事实上，我们通过计算需要调整的次数来评估算法的“准确性”。我们进行了一项用户研究，其中使用我们的精度度量来估计用户通过使用我们的算法获得初始匹配可以节省的劳动力。最后，我们将说明如何将我们的匹配算法部署为管理信息模型和映射的已实现测试平台中的几个高级操作符之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 18th International Conference on Data Engineering

自引率

0.00%

发文量