基于层次贝叶斯聚类(HBC)的记录链接系统标准化与搜索方法

Zin War Tun, N. Thein
{"title":"基于层次贝叶斯聚类(HBC)的记录链接系统标准化与搜索方法","authors":"Zin War Tun, N. Thein","doi":"10.1109/C5.2007.5","DOIUrl":null,"url":null,"abstract":"Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.","PeriodicalId":355191,"journal":{"name":"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System\",\"authors\":\"Zin War Tun, N. Thein\",\"doi\":\"10.1109/C5.2007.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.\",\"PeriodicalId\":355191,\"journal\":{\"name\":\"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/C5.2007.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/C5.2007.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

Web上的信息源由不同的文本格式控制,并且具有不同的不一致性。许多在线来源的数据不包含足够的信息来准确地链接记录。要链接来自不同数据源的记录,任何系统都必须识别来自这些数据源的公共实体。因此,记录联动的主要挑战是计算复杂度和联动精度。为了减少用于比较的记录对的数量,记录链接利用相似性搜索技术来搜索候选的相似记录。在记录联动系统中使用了多种检索方法。在本文中,我们提出了一个记录链接框架,并着重于标准化和改进搜索方法,通过采用基于聚类的搜索方法的一个高级特性,即层次贝叶斯聚类(HBC),不仅可以更有效地进行记录对比较,而且可以加快记录链接的准确性。该方法的目的是将相似的记录放在集群中,限制了记录比较的搜索范围,提高了匹配精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System
Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信