ACERPI-Block: Applying Blocking Techniques to the ACERPI Approach

Christian Schmitz, Jonathan Martins, Serigne K. Mbaye, Edimar Manica, Renata Galante
{"title":"ACERPI-Block: Applying Blocking Techniques to the ACERPI Approach","authors":"Christian Schmitz, Jonathan Martins, Serigne K. Mbaye, Edimar Manica, Renata Galante","doi":"10.5753/jidm.2022.2509","DOIUrl":null,"url":null,"abstract":"Ordinances are documents issued by federal institutions that contain, among others, information regarding their staff. These documents are accessible through public repositories that usually do not allow any filter or advanced search on documents’ contents. This paper extends ACERPI (an approach to collect documents, extract information and resolve entities from institutional ordinances), which identifies the people mentioned in ordinances from institutions to help users find the documents of interest. ACERPI-Block focuses on the Entity Resolution step of the approach, developing blocking strategies that allow scalability to hundreds of thousands of records being resolved. Experiments show a reduction of 93.3% in the number of comparisons of similarity between records if compared to the solution without blocking, with no decrease in efficacy.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Data Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/jidm.2022.2509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Ordinances are documents issued by federal institutions that contain, among others, information regarding their staff. These documents are accessible through public repositories that usually do not allow any filter or advanced search on documents’ contents. This paper extends ACERPI (an approach to collect documents, extract information and resolve entities from institutional ordinances), which identifies the people mentioned in ordinances from institutions to help users find the documents of interest. ACERPI-Block focuses on the Entity Resolution step of the approach, developing blocking strategies that allow scalability to hundreds of thousands of records being resolved. Experiments show a reduction of 93.3% in the number of comparisons of similarity between records if compared to the solution without blocking, with no decrease in efficacy.
ACERPI- block:将block技术应用于ACERPI方法
条例是联邦机构发布的文件,其中包括有关其工作人员的信息。这些文档可以通过公共存储库访问,这些存储库通常不允许对文档内容进行任何过滤或高级搜索。本文扩展ACERPI(一种从机构条例中收集文件、提取信息和解析实体的方法),从机构中识别条例中提到的人,以帮助用户找到感兴趣的文件。ACERPI-Block专注于该方法的实体解析步骤,开发阻塞策略,允许可扩展性到成千上万的记录被解析。实验表明,如果与没有阻塞的溶液相比,记录之间的相似性比较次数减少了93.3%,而效率没有下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信