Taiming Wang, Yue Kou, Derong Shen, Heng Liu, Ge Yu
{"title":"SIER: An Efficient Entity Resolution Mechanism Combining SNM and Iteration","authors":"Taiming Wang, Yue Kou, Derong Shen, Heng Liu, Ge Yu","doi":"10.1109/WISA.2014.50","DOIUrl":null,"url":null,"abstract":"With the rapid increase of data, entity resolution (ER) faces two challenges: high quality and high performance. Correspondingly, current work focuses on iteration-based entity resolution or sorted neighborhood (SNM) - based entity resolution. The former iteratively merges similar records to acquire higher precision and recall. The latter only compares the records within the same sliding window to maintain higher performance. However, they are at the cost of either sacrificing efficiency or result quality. In this paper, we present an entity resolution mechanism combining SNM and iteration (called SIER). Unlike traditional approaches, SIER can fully exploit the advantages of SNM and iteration. Also a two-stage entity matching algorithm is proposed. In the first stage, the records are initially matched based on sliding window. In the second stage, the matching result is rectified iteratively to improve the quality of the result. The experiments demonstrate the feasibility and effectiveness of our method.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th Web Information System and Application Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2014.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid increase of data, entity resolution (ER) faces two challenges: high quality and high performance. Correspondingly, current work focuses on iteration-based entity resolution or sorted neighborhood (SNM) - based entity resolution. The former iteratively merges similar records to acquire higher precision and recall. The latter only compares the records within the same sliding window to maintain higher performance. However, they are at the cost of either sacrificing efficiency or result quality. In this paper, we present an entity resolution mechanism combining SNM and iteration (called SIER). Unlike traditional approaches, SIER can fully exploit the advantages of SNM and iteration. Also a two-stage entity matching algorithm is proposed. In the first stage, the records are initially matched based on sliding window. In the second stage, the matching result is rectified iteratively to improve the quality of the result. The experiments demonstrate the feasibility and effectiveness of our method.