PoWareMatch: A Quality-aware Deep Learning Approach to Improve Human Schema Matching

ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2021-09-15 DOI:10.1145/3483423

Roee Shraga, A. Gal

{"title":"PoWareMatch: A Quality-aware Deep Learning Approach to Improve Human Schema Matching","authors":"Roee Shraga, A. Gal","doi":"10.1145/3483423","DOIUrl":null,"url":null,"abstract":"Schema matching is a core task of any data integration process. Being investigated in the fields of databases, AI, Semantic Web, and data mining for many years, the main challenge remains the ability to generate quality matches among data concepts (e.g., database attributes). In this work, we examine a novel angle on the behavior of humans as matchers, studying match creation as a process. We analyze the dynamics of common evaluation measures (precision, recall, and f-measure), with respect to this angle and highlight the need for unbiased matching to support this analysis. Unbiased matching, a newly defined concept that describes the common assumption that human decisions represent reliable assessments of schemata correspondences, is, however, not an inherent property of human matchers. In what follows, we design PoWareMatch that makes use of a deep learning mechanism to calibrate and filter human matching decisions adhering to the quality of a match, which are then combined with algorithmic matching to generate better match results. We provide an empirical evidence, established based on an experiment with more than 200 human matchers over common benchmarks, that PoWareMatch predicts well the benefit of extending the match with an additional correspondence and generates high-quality matches. In addition, PoWareMatch outperforms state-of-the-art matching algorithms.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3483423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Schema matching is a core task of any data integration process. Being investigated in the fields of databases, AI, Semantic Web, and data mining for many years, the main challenge remains the ability to generate quality matches among data concepts (e.g., database attributes). In this work, we examine a novel angle on the behavior of humans as matchers, studying match creation as a process. We analyze the dynamics of common evaluation measures (precision, recall, and f-measure), with respect to this angle and highlight the need for unbiased matching to support this analysis. Unbiased matching, a newly defined concept that describes the common assumption that human decisions represent reliable assessments of schemata correspondences, is, however, not an inherent property of human matchers. In what follows, we design PoWareMatch that makes use of a deep learning mechanism to calibrate and filter human matching decisions adhering to the quality of a match, which are then combined with algorithmic matching to generate better match results. We provide an empirical evidence, established based on an experiment with more than 200 human matchers over common benchmarks, that PoWareMatch predicts well the benefit of extending the match with an additional correspondence and generates high-quality matches. In addition, PoWareMatch outperforms state-of-the-art matching algorithms.

查看原文本刊更多论文

PoWareMatch:一种改进人类模式匹配的质量感知深度学习方法

模式匹配是任何数据集成过程的核心任务。在数据库、人工智能、语义网和数据挖掘领域进行了多年的研究后，主要的挑战仍然是在数据概念(例如，数据库属性)之间生成高质量匹配的能力。在这项工作中，我们从一个新的角度研究了人类作为匹配者的行为，将匹配创造作为一个过程来研究。我们从这个角度分析了常见的评估措施(精度、召回率和f-measure)的动态，并强调需要无偏匹配来支持这一分析。无偏匹配是一个新定义的概念，它描述了人类决策代表对模式对应的可靠评估的共同假设，然而，它并不是人类匹配器的固有属性。接下来，我们设计了PoWareMatch，它利用深度学习机制来校准和过滤符合匹配质量的人类匹配决策，然后将其与算法匹配相结合，以产生更好的匹配结果。我们提供了一个经验证据，基于200多个人类匹配器在共同基准上的实验，PoWareMatch很好地预测了使用额外的对应扩展匹配的好处，并生成了高质量的匹配。此外，PoWareMatch优于最先进的匹配算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Journal of Data and Information Quality (JDIQ)

自引率

0.00%

发文量