D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution

O. Benjelloun, H. Garcia-Molina, Heng Gong, H. Kawai, T. E. Larson, David Menestrina, Sutthipong Thavisomboon
{"title":"D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution","authors":"O. Benjelloun, H. Garcia-Molina, Heng Gong, H. Kawai, T. E. Larson, David Menestrina, Sutthipong Thavisomboon","doi":"10.1109/ICDCS.2007.96","DOIUrl":null,"url":null,"abstract":"Entity resolution (ER) matches and merges records that refer to the same real-world entities, and is typically a compute-intensive process due to complex matching functions and high data volumes. We present a family of algorithms, D-Swoosh, for distributing the ER workload across multiple processors. The algorithms use generic match and merge functions, and ensure that new merged records are distributed to processors that may have matching records. We perform a detailed performance evaluation on a testbed of 15 processors. Our experiments use actual comparison shopping data provided by Yahoo!.","PeriodicalId":170317,"journal":{"name":"27th International Conference on Distributed Computing Systems (ICDCS '07)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"27th International Conference on Distributed Computing Systems (ICDCS '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2007.96","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 50

Abstract

Entity resolution (ER) matches and merges records that refer to the same real-world entities, and is typically a compute-intensive process due to complex matching functions and high data volumes. We present a family of algorithms, D-Swoosh, for distributing the ER workload across multiple processors. The algorithms use generic match and merge functions, and ensure that new merged records are distributed to processors that may have matching records. We perform a detailed performance evaluation on a testbed of 15 processors. Our experiments use actual comparison shopping data provided by Yahoo!.
D-Swoosh:一组通用分布式实体解析算法
实体解析(ER)匹配和合并引用相同现实世界实体的记录,由于复杂的匹配功能和高数据量,这通常是一个计算密集型过程。我们提出了一系列算法D-Swoosh,用于在多个处理器之间分配ER工作负载。算法使用通用的匹配和合并函数,并确保新的合并记录被分发到可能具有匹配记录的处理器。我们在15个处理器的测试台上进行了详细的性能评估。我们的实验使用雅虎提供的实际比较购物数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信