Methods and analyses for determining quality

W. Winkler
{"title":"Methods and analyses for determining quality","authors":"W. Winkler","doi":"10.1145/1077501.1077505","DOIUrl":null,"url":null,"abstract":"In a possibly ideal world, records in a database would be complete and would contain fields having values that correspond to an underlying reality. An individuals name, address and date-of-birth would be present without typographical error. An income field might be a reasonably close approximation of a \"true income\" and would not be missing. A list of customers would be complete, unduplicated and current.In this ideal world, a database could be used for several purposes and would be considered to have high quality. A set of databases might be linked using name, address, and other weakly identifying information.In this paper, we describe situations where properly chosen metrics may indicate that data quality is not sufficiently high for monitoring processes, for modeling, and for data mining.Some of the metrics are supplementary to those in the quality literature or have rarely been used. Additionally, we describe generalized methods and software tools that allow a skilled individual to perform massive clean-up of files in some situations.The clean-up, while possibly sub-optimal in recreating \"truth\", can replace exceptionally large amounts of clerical review and allow many uses of the \"cleaned\" files.","PeriodicalId":306187,"journal":{"name":"Information Quality in Information Systems","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Quality in Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1077501.1077505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In a possibly ideal world, records in a database would be complete and would contain fields having values that correspond to an underlying reality. An individuals name, address and date-of-birth would be present without typographical error. An income field might be a reasonably close approximation of a "true income" and would not be missing. A list of customers would be complete, unduplicated and current.In this ideal world, a database could be used for several purposes and would be considered to have high quality. A set of databases might be linked using name, address, and other weakly identifying information.In this paper, we describe situations where properly chosen metrics may indicate that data quality is not sufficiently high for monitoring processes, for modeling, and for data mining.Some of the metrics are supplementary to those in the quality literature or have rarely been used. Additionally, we describe generalized methods and software tools that allow a skilled individual to perform massive clean-up of files in some situations.The clean-up, while possibly sub-optimal in recreating "truth", can replace exceptionally large amounts of clerical review and allow many uses of the "cleaned" files.
质量测定方法及分析
在可能理想的情况下,数据库中的记录是完整的,并且包含具有与底层现实相对应的值的字段。个人的姓名、地址和出生日期将不会出现印刷错误。收入字段可能与“真实收入”相当接近,不会遗漏。客户列表将是完整的、不重复的和最新的。在这种理想情况下,数据库可以用于多种目的,并且被认为具有高质量。可以使用名称、地址和其他弱标识信息链接一组数据库。在本文中,我们描述了适当选择的度量可能表明数据质量不足以用于监控过程、建模和数据挖掘的情况。有些度量标准是对质量文献中的度量标准的补充,或者很少被使用。此外,我们描述了通用的方法和软件工具,这些方法和软件工具允许熟练的个人在某些情况下执行大量的文件清理。清理虽然在重建“真相”方面可能不是最理想的,但可以取代异常大量的文书审查,并允许许多人使用“清理过的”文件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信