Preventing data errors with continuous testing

Kivanç Muslu, Yuriy Brun, A. Meliou
{"title":"Preventing data errors with continuous testing","authors":"Kivanç Muslu, Yuriy Brun, A. Meliou","doi":"10.1145/2771783.2771792","DOIUrl":null,"url":null,"abstract":"Today, software systems that rely on data are ubiquitous, and ensuring the data's quality is an increasingly important challenge as data errors result in annual multi-billion dollar losses. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: identifying system errors caused by well-formed but incorrect data. We present continuous data testing (CDT), a low-overhead, delay-free technique that quickly identifies likely data errors. CDT continuously executes domain-specific test queries; when a test fails, CDT unobtrusively warns the user or administrator. We implement CDT in the ConTest prototype for the PostgreSQL database management system. A feasibility user study with 96 humans shows that ConTest was extremely effective in a setting with a data entry application at guarding against data errors: With ConTest, users corrected 98.4% of their errors, as opposed to 40.2% without, even when we injected 40% false positives into ConTest's output. Further, when using ConTest, users corrected data entry errors 3.2 times faster than when using state-of-the-art methods.","PeriodicalId":264859,"journal":{"name":"Proceedings of the 2015 International Symposium on Software Testing and Analysis","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2771783.2771792","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

Today, software systems that rely on data are ubiquitous, and ensuring the data's quality is an increasingly important challenge as data errors result in annual multi-billion dollar losses. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: identifying system errors caused by well-formed but incorrect data. We present continuous data testing (CDT), a low-overhead, delay-free technique that quickly identifies likely data errors. CDT continuously executes domain-specific test queries; when a test fails, CDT unobtrusively warns the user or administrator. We implement CDT in the ConTest prototype for the PostgreSQL database management system. A feasibility user study with 96 humans shows that ConTest was extremely effective in a setting with a data entry application at guarding against data errors: With ConTest, users corrected 98.4% of their errors, as opposed to 40.2% without, even when we injected 40% false positives into ConTest's output. Further, when using ConTest, users corrected data entry errors 3.2 times faster than when using state-of-the-art methods.
通过持续测试防止数据错误
如今,依赖于数据的软件系统无处不在,由于数据错误导致每年数十亿美元的损失,确保数据质量是一项日益重要的挑战。虽然软件调试和测试得到了大量的研究关注,但很少有人致力于数据调试:识别由格式良好但不正确的数据引起的系统错误。我们介绍了连续数据测试(CDT),这是一种低开销、无延迟的技术,可以快速识别可能的数据错误。CDT连续执行特定于领域的测试查询;当测试失败时,CDT会不显眼地警告用户或管理员。我们在PostgreSQL数据库管理系统的ConTest原型中实现了CDT。一项包含96人的可行性用户研究表明,在数据输入应用程序的设置中,ConTest在防止数据错误方面非常有效:使用ConTest,用户纠正了98.4%的错误,而没有ConTest,即使我们在ConTest的输出中注入40%的假阳性,用户也纠正了40.2%的错误。此外,当使用ConTest时,用户纠正数据输入错误的速度比使用最先进的方法快3.2倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信