DeepBlock:一种基于深度学习的实体解析新方法

Delaram Javdani, H. Rahmani, Milad Allahgholi, Fatemeh Karimkhani
{"title":"DeepBlock:一种基于深度学习的实体解析新方法","authors":"Delaram Javdani, H. Rahmani, Milad Allahgholi, Fatemeh Karimkhani","doi":"10.1109/ICWR.2019.8765267","DOIUrl":null,"url":null,"abstract":"Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to a second-order matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, including the token blocking, that holds records with a similar token in the same block. Most of the previous methods did not take any semantic criteria into account. In this paper, we propose a new method, called DeepBlock that uses deep learning for the task of blocking in entity resolution. DeepBlock combines syntactic and semantic similarities to calculate the similarity between records. We have evaluated the DeepBlock over the real-world dataset and compared it with the existing blocking technique (token blocking). Our experimental result shows that the combination of semantic and syntactic similarity can considerably improve the quality of blocking. The results show that DeepBlock outperforms the token blocking method significantly with respect to pair quality (PQ) measure.","PeriodicalId":6680,"journal":{"name":"2019 5th International Conference on Web Research (ICWR)","volume":"54 1","pages":"41-44"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning\",\"authors\":\"Delaram Javdani, H. Rahmani, Milad Allahgholi, Fatemeh Karimkhani\",\"doi\":\"10.1109/ICWR.2019.8765267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to a second-order matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, including the token blocking, that holds records with a similar token in the same block. Most of the previous methods did not take any semantic criteria into account. In this paper, we propose a new method, called DeepBlock that uses deep learning for the task of blocking in entity resolution. DeepBlock combines syntactic and semantic similarities to calculate the similarity between records. We have evaluated the DeepBlock over the real-world dataset and compared it with the existing blocking technique (token blocking). Our experimental result shows that the combination of semantic and syntactic similarity can considerably improve the quality of blocking. The results show that DeepBlock outperforms the token blocking method significantly with respect to pair quality (PQ) measure.\",\"PeriodicalId\":6680,\"journal\":{\"name\":\"2019 5th International Conference on Web Research (ICWR)\",\"volume\":\"54 1\",\"pages\":\"41-44\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 5th International Conference on Web Research (ICWR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWR.2019.8765267\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR.2019.8765267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

实体解析是指对属于唯一实体的记录进行识别和集成的过程。标准方法是使用基于规则或机器学习的模型来比较和分配一个点,以指示匹配或不匹配对记录的状态。但是,在所有记录对之间进行全面比较会导致二级匹配复杂性。因此,在匹配之前使用块方法,将相同的实体分组成小块。然后进行全面的匹配操作。提供了几种阻塞方法来有效地将输入数据阻塞到可管理的组中,包括令牌阻塞,它在同一块中保存具有类似令牌的记录。以前的方法大多不考虑语义标准。在本文中,我们提出了一种名为DeepBlock的新方法,该方法使用深度学习来完成实体解析中的阻塞任务。DeepBlock结合句法和语义相似性来计算记录之间的相似性。我们在真实世界的数据集上评估了DeepBlock,并将其与现有的阻塞技术(令牌阻塞)进行了比较。实验结果表明,语义相似度和句法相似度的结合可以显著提高分组的质量。结果表明,DeepBlock在对质量(PQ)度量方面明显优于令牌阻塞方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning
Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to a second-order matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, including the token blocking, that holds records with a similar token in the same block. Most of the previous methods did not take any semantic criteria into account. In this paper, we propose a new method, called DeepBlock that uses deep learning for the task of blocking in entity resolution. DeepBlock combines syntactic and semantic similarities to calculate the similarity between records. We have evaluated the DeepBlock over the real-world dataset and compared it with the existing blocking technique (token blocking). Our experimental result shows that the combination of semantic and syntactic similarity can considerably improve the quality of blocking. The results show that DeepBlock outperforms the token blocking method significantly with respect to pair quality (PQ) measure.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信