{"title":"使用图形数据库系统的副本管理:一个案例研究","authors":"R. Vaz, J. Oliveira, Leonardo Ribeiro","doi":"10.1145/3330204.3330260","DOIUrl":null,"url":null,"abstract":"The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.","PeriodicalId":348938,"journal":{"name":"Proceedings of the XV Brazilian Symposium on Information Systems","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Duplicate Management Using Graph Database Systems: A Case Study\",\"authors\":\"R. Vaz, J. Oliveira, Leonardo Ribeiro\",\"doi\":\"10.1145/3330204.3330260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.\",\"PeriodicalId\":348938,\"journal\":{\"name\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the XV Brazilian Symposium on Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3330204.3330260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the XV Brazilian Symposium on Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330204.3330260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Duplicate Management Using Graph Database Systems: A Case Study
The presence of multiple representations of an object of information, referred to as duplicates, is a ubiquitous problem in information system databases. Besides corrupting analysis results, such inconsistencies may compromise the functionality of applications that need to correlate information from different sources, such as auditing and fraud detection systems. The traditional approach to this problem has two steps: first, duplicates are identified in a typically semi-automatic process and, then, each group of duplicates is fused into a single consolidated representation. However, this strategy may result in information loss if records have been erroneously classified as duplicates. This article presents a case study on a different approach, which consists of using graph database systems to model similarity relationships between data objects. In this way, possible duplicates can be dynamically identified using basic operations on graphs. The study was carried out within the scope of the Controladoria Geral do Estado de Goiás, as part of the development of an application to detect evidence of fraud in public bids. Initial results indicate that the proposed approach is effective and efficient in discovering links involving possible duplicates.