模式匹配网络问题的机器学习技术研究

Journal of the Brazilian Computer Society Pub Date : 2021-11-23 DOI:10.1186/s13173-021-00119-5

Rodrigues, Diego, Silva, Altigran da

{"title":"模式匹配网络问题的机器学习技术研究","authors":"Rodrigues, Diego, Silva, Altigran da","doi":"10.1186/s13173-021-00119-5","DOIUrl":null,"url":null,"abstract":"Schema matching is the problem of finding semantic correspondences between elements from different schemas. This is a challenging problem since disparate elements in the schemas often represent the same concept. Traditional instances of this problem involved a pair of schemas. However, recently, there has been an increasing interest in matching several related schemas at once, a problem known as schema matching networks. The goal is to identify elements from several schemas that correspond to a single concept. We propose a family of methods for schema matching networks based on machine learning, which proved to be a competitive alternative for the traditional matching problem in several domains. To overcome the issue of requiring a large amount of training data, we also propose a bootstrapping procedure to generate training data automatically. In addition, we leverage constraints that arise in network scenarios to improve the quality of this data. We also study a strategy for receiving user feedback to assert some of the matchings generated and, relying on this feedback, improve the final result’s quality. Our experiments show that our methods can outperform baselines, reaching F1-score up to 0.83.","PeriodicalId":39760,"journal":{"name":"Journal of the Brazilian Computer Society","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A study on machine learning techniques for the schema matching network problem\",\"authors\":\"Rodrigues, Diego, Silva, Altigran da\",\"doi\":\"10.1186/s13173-021-00119-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Schema matching is the problem of finding semantic correspondences between elements from different schemas. This is a challenging problem since disparate elements in the schemas often represent the same concept. Traditional instances of this problem involved a pair of schemas. However, recently, there has been an increasing interest in matching several related schemas at once, a problem known as schema matching networks. The goal is to identify elements from several schemas that correspond to a single concept. We propose a family of methods for schema matching networks based on machine learning, which proved to be a competitive alternative for the traditional matching problem in several domains. To overcome the issue of requiring a large amount of training data, we also propose a bootstrapping procedure to generate training data automatically. In addition, we leverage constraints that arise in network scenarios to improve the quality of this data. We also study a strategy for receiving user feedback to assert some of the matchings generated and, relying on this feedback, improve the final result’s quality. Our experiments show that our methods can outperform baselines, reaching F1-score up to 0.83.\",\"PeriodicalId\":39760,\"journal\":{\"name\":\"Journal of the Brazilian Computer Society\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Brazilian Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13173-021-00119-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Brazilian Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13173-021-00119-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

模式匹配是在来自不同模式的元素之间寻找语义对应的问题。这是一个具有挑战性的问题，因为模式中的不同元素通常表示相同的概念。这个问题的传统实例涉及一对模式。然而，最近人们对同时匹配几个相关的模式越来越感兴趣，这个问题被称为模式匹配网络。目标是识别与单个概念对应的几个模式中的元素。我们提出了一系列基于机器学习的模式匹配网络方法，这些方法在多个领域被证明是传统匹配问题的一个有竞争力的替代方案。为了克服需要大量训练数据的问题，我们还提出了一种自动生成训练数据的自举过程。此外，我们利用网络场景中出现的约束来提高数据的质量。我们还研究了一种接收用户反馈的策略，以断言生成的一些匹配，并依靠这些反馈来提高最终结果的质量。我们的实验表明，我们的方法可以优于基线，达到F1-score高达0.83。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A study on machine learning techniques for the schema matching network problem

Schema matching is the problem of finding semantic correspondences between elements from different schemas. This is a challenging problem since disparate elements in the schemas often represent the same concept. Traditional instances of this problem involved a pair of schemas. However, recently, there has been an increasing interest in matching several related schemas at once, a problem known as schema matching networks. The goal is to identify elements from several schemas that correspond to a single concept. We propose a family of methods for schema matching networks based on machine learning, which proved to be a competitive alternative for the traditional matching problem in several domains. To overcome the issue of requiring a large amount of training data, we also propose a bootstrapping procedure to generate training data automatically. In addition, we leverage constraints that arise in network scenarios to improve the quality of this data. We also study a strategy for receiving user feedback to assert some of the matchings generated and, relying on this feedback, improve the final result’s quality. Our experiments show that our methods can outperform baselines, reaching F1-score up to 0.83.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Brazilian Computer Society Computer Science-Computer Science (all)

CiteScore

2.40

自引率

0.00%

发文量

期刊介绍： JBCS is a formal quarterly publication of the Brazilian Computer Society. It is a peer-reviewed international journal which aims to serve as a forum to disseminate innovative research in all fields of computer science and related subjects. Theoretical, practical and experimental papers reporting original research contributions are welcome, as well as high quality survey papers. The journal is open to contributions in all computer science topics, computer systems development or in formal and theoretical aspects of computing, as the list of topics below is not exhaustive. Contributions will be considered for publication in JBCS if they have not been published previously and are not under consideration for publication elsewhere.