使用图形数据库系统的副本管理:一个案例研究

Proceedings of the XV Brazilian Symposium on Information Systems Pub Date : 2019-05-20 DOI:10.1145/3330204.3330260

R. Vaz, J. Oliveira, Leonardo Ribeiro

{"title":"使用图形数据库系统的副本管理:一个案例研究","authors":"R. Vaz, J. Oliveira, Leonardo Ribeiro","doi":"10.1145/3330204.3330260","DOIUrl":null,"url":null,"abstract":"The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.","PeriodicalId":348938,"journal":{"name":"Proceedings of the XV Brazilian Symposium on Information Systems","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Duplicate Management Using Graph Database Systems: A Case Study\",\"authors\":\"R. Vaz, J. Oliveira, Leonardo Ribeiro\",\"doi\":\"10.1145/3330204.3330260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.\",\"PeriodicalId\":348938,\"journal\":{\"name\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3330204.3330260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XV Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330204.3330260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

信息对象的多个表示形式(称为重复)的存在是信息系统数据库中普遍存在的问题。除了破坏分析结果外，这种不一致还可能损害需要关联来自不同来源的信息的应用程序的功能，例如审计和欺诈检测系统。解决这个问题的传统方法有两个步骤:首先，在典型的半自动过程中识别重复项，然后，将每组重复项融合到一个统一的表示中。但是，如果错误地将记录分类为副本，则该策略可能导致信息丢失。本文介绍了一个不同方法的案例研究，该方法包括使用图数据库系统对数据对象之间的相似性关系进行建模。通过这种方式，可以使用图上的基本操作动态识别可能的副本。这项研究是在Goiás国家总督局的范围内进行的，作为开发一项用于发现公开投标欺诈证据的应用程序的一部分。初步结果表明，所提出的方法在发现涉及可能重复的链接方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Duplicate Management Using Graph Database Systems: A Case Study

The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the XV Brazilian Symposium on Information Systems

自引率

0.00%

发文量