通过持续测试防止数据错误

Kivanç Muslu, Yuriy Brun, A. Meliou
{"title":"通过持续测试防止数据错误","authors":"Kivanç Muslu, Yuriy Brun, A. Meliou","doi":"10.1145/2771783.2771792","DOIUrl":null,"url":null,"abstract":"Today, software systems that rely on data are ubiquitous, and ensuring the data's quality is an increasingly important challenge as data errors result in annual multi-billion dollar losses. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: identifying system errors caused by well-formed but incorrect data. We present continuous data testing (CDT), a low-overhead, delay-free technique that quickly identifies likely data errors. CDT continuously executes domain-specific test queries; when a test fails, CDT unobtrusively warns the user or administrator. We implement CDT in the ConTest prototype for the PostgreSQL database management system. A feasibility user study with 96 humans shows that ConTest was extremely effective in a setting with a data entry application at guarding against data errors: With ConTest, users corrected 98.4% of their errors, as opposed to 40.2% without, even when we injected 40% false positives into ConTest's output. Further, when using ConTest, users corrected data entry errors 3.2 times faster than when using state-of-the-art methods.","PeriodicalId":264859,"journal":{"name":"Proceedings of the 2015 International Symposium on Software Testing and Analysis","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Preventing data errors with continuous testing\",\"authors\":\"Kivanç Muslu, Yuriy Brun, A. Meliou\",\"doi\":\"10.1145/2771783.2771792\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today, software systems that rely on data are ubiquitous, and ensuring the data's quality is an increasingly important challenge as data errors result in annual multi-billion dollar losses. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: identifying system errors caused by well-formed but incorrect data. We present continuous data testing (CDT), a low-overhead, delay-free technique that quickly identifies likely data errors. CDT continuously executes domain-specific test queries; when a test fails, CDT unobtrusively warns the user or administrator. We implement CDT in the ConTest prototype for the PostgreSQL database management system. A feasibility user study with 96 humans shows that ConTest was extremely effective in a setting with a data entry application at guarding against data errors: With ConTest, users corrected 98.4% of their errors, as opposed to 40.2% without, even when we injected 40% false positives into ConTest's output. Further, when using ConTest, users corrected data entry errors 3.2 times faster than when using state-of-the-art methods.\",\"PeriodicalId\":264859,\"journal\":{\"name\":\"Proceedings of the 2015 International Symposium on Software Testing and Analysis\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Symposium on Software Testing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2771783.2771792\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2771783.2771792","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

摘要

如今,依赖于数据的软件系统无处不在,由于数据错误导致每年数十亿美元的损失,确保数据质量是一项日益重要的挑战。虽然软件调试和测试得到了大量的研究关注,但很少有人致力于数据调试:识别由格式良好但不正确的数据引起的系统错误。我们介绍了连续数据测试(CDT),这是一种低开销、无延迟的技术,可以快速识别可能的数据错误。CDT连续执行特定于领域的测试查询;当测试失败时,CDT会不显眼地警告用户或管理员。我们在PostgreSQL数据库管理系统的ConTest原型中实现了CDT。一项包含96人的可行性用户研究表明,在数据输入应用程序的设置中,ConTest在防止数据错误方面非常有效:使用ConTest,用户纠正了98.4%的错误,而没有ConTest,即使我们在ConTest的输出中注入40%的假阳性,用户也纠正了40.2%的错误。此外,当使用ConTest时,用户纠正数据输入错误的速度比使用最先进的方法快3.2倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Preventing data errors with continuous testing
Today, software systems that rely on data are ubiquitous, and ensuring the data's quality is an increasingly important challenge as data errors result in annual multi-billion dollar losses. While software debugging and testing have received heavy research attention, less effort has been devoted to data debugging: identifying system errors caused by well-formed but incorrect data. We present continuous data testing (CDT), a low-overhead, delay-free technique that quickly identifies likely data errors. CDT continuously executes domain-specific test queries; when a test fails, CDT unobtrusively warns the user or administrator. We implement CDT in the ConTest prototype for the PostgreSQL database management system. A feasibility user study with 96 humans shows that ConTest was extremely effective in a setting with a data entry application at guarding against data errors: With ConTest, users corrected 98.4% of their errors, as opposed to 40.2% without, even when we injected 40% false positives into ConTest's output. Further, when using ConTest, users corrected data entry errors 3.2 times faster than when using state-of-the-art methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信