{"title":"基于层次贝叶斯聚类(HBC)的记录链接系统标准化与搜索方法","authors":"Zin War Tun, N. Thein","doi":"10.1109/C5.2007.5","DOIUrl":null,"url":null,"abstract":"Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.","PeriodicalId":355191,"journal":{"name":"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System\",\"authors\":\"Zin War Tun, N. Thein\",\"doi\":\"10.1109/C5.2007.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.\",\"PeriodicalId\":355191,\"journal\":{\"name\":\"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/C5.2007.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fifth International Conference on Creating, Connecting and Collaborating through Computing (C5 '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/C5.2007.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Approach of Standardization and Searching based on Hierarchical Bayesian Clustering (HBC) for Record Linkage System
Information sources on the Web are controlled by different text formats, and have varying inconsistencies. Data form many online sources do not contain enough information to accurately link the records. To link record from different data sources, any system must identify common entities from these sources. Therefore, the major challenges in record linkage are computational complexity and linkage accuracy. To reduce the number of record pairs for comparison, record linkage utilizes similarity search techniques in order to search for candidate similar records. Various searching methods have been used in record linkage systems. In this paper, we propose a record linkage framework and also focus on standardization and enhance the searching method by adopting an advanced feature of cluster-based searching method called Hierarchical Bayesian Clustering (HBC), which is not only for more efficient record pair comparison, but also for speeding up the record linkage accuracy. The purpose of this method is to place similar records into cluster that restricts the search scope for record comparison and also enhances matching accuracy.