Evaluating Data Consistency with Matching Dependencies from Multiple Sources

Mi Huang, Lingli Li, Ping Xuan
{"title":"Evaluating Data Consistency with Matching Dependencies from Multiple Sources","authors":"Mi Huang, Lingli Li, Ping Xuan","doi":"10.1109/ICPDS47662.2019.9017191","DOIUrl":null,"url":null,"abstract":"With the rapid growth of data, data quality issues have attracted increasing attention in both industry and academia. Since data consistency is one of the critical issues in data quality, we study the problem of how to evaluate the consistency of target data from multiple relevant sources under matching dependencies (MDs). Since accessing data sources directly introduces a huge cost of data comparisons, so this paper aims to design an efficient approximate consistency evaluation method with linear-time complexity. Firstly, we build a signature for each data source to approximate the pattern sets in this source defined by the MDs. Secondly, we develop a signature-based evaluation method to compute the consistency of target data based on the signatures of all the data sources that are related to our target data. Experimental results on real datasets shows high performance on both accuracy and efficiency of our algorithm.","PeriodicalId":130202,"journal":{"name":"2019 IEEE International Conference on Power Data Science (ICPDS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Power Data Science (ICPDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPDS47662.2019.9017191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the rapid growth of data, data quality issues have attracted increasing attention in both industry and academia. Since data consistency is one of the critical issues in data quality, we study the problem of how to evaluate the consistency of target data from multiple relevant sources under matching dependencies (MDs). Since accessing data sources directly introduces a huge cost of data comparisons, so this paper aims to design an efficient approximate consistency evaluation method with linear-time complexity. Firstly, we build a signature for each data source to approximate the pattern sets in this source defined by the MDs. Secondly, we develop a signature-based evaluation method to compute the consistency of target data based on the signatures of all the data sources that are related to our target data. Experimental results on real datasets shows high performance on both accuracy and efficiency of our algorithm.
评估数据一致性与匹配依赖从多个来源
随着数据量的快速增长,数据质量问题越来越受到业界和学术界的关注。由于数据一致性是数据质量的关键问题之一,我们研究了在匹配依赖关系下如何评估多个相关源的目标数据的一致性问题。由于直接访问数据源会带来巨大的数据比较成本,因此本文旨在设计一种具有线性时间复杂度的高效近似一致性评估方法。首先,我们为每个数据源构建一个签名,以近似该数据源中由MDs定义的模式集。其次,我们开发了一种基于签名的评估方法,基于与目标数据相关的所有数据源的签名来计算目标数据的一致性。在实际数据集上的实验结果表明,该算法具有较高的精度和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信