Duplicate Management Using Graph Database Systems: A Case Study

R. Vaz, J. Oliveira, Leonardo Ribeiro
{"title":"Duplicate Management Using Graph Database Systems: A Case Study","authors":"R. Vaz, J. Oliveira, Leonardo Ribeiro","doi":"10.1145/3330204.3330260","DOIUrl":null,"url":null,"abstract":"The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.","PeriodicalId":348938,"journal":{"name":"Proceedings of the XV Brazilian Symposium on Information Systems","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XV Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330204.3330260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.
使用图形数据库系统的副本管理:一个案例研究
信息对象的多个表示形式(称为重复)的存在是信息系统数据库中普遍存在的问题。除了破坏分析结果外,这种不一致还可能损害需要关联来自不同来源的信息的应用程序的功能,例如审计和欺诈检测系统。解决这个问题的传统方法有两个步骤:首先,在典型的半自动过程中识别重复项,然后,将每组重复项融合到一个统一的表示中。但是,如果错误地将记录分类为副本,则该策略可能导致信息丢失。本文介绍了一个不同方法的案例研究,该方法包括使用图数据库系统对数据对象之间的相似性关系进行建模。通过这种方式,可以使用图上的基本操作动态识别可能的副本。这项研究是在Goiás国家总督局的范围内进行的,作为开发一项用于发现公开投标欺诈证据的应用程序的一部分。初步结果表明,所提出的方法在发现涉及可能重复的链接方面是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信