An Empirical Analysis of Blind Tests

2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST) Pub Date : 2020-10-01 DOI:10.1109/icst46399.2020.00034

Kesina Baral, Jeff Offutt

{"title":"An Empirical Analysis of Blind Tests","authors":"Kesina Baral, Jeff Offutt","doi":"10.1109/icst46399.2020.00034","DOIUrl":null,"url":null,"abstract":"Modern software engineers automate as many tests as possible. Test automation allows tests to be run hundreds or thousands of times: hourly, daily, and sometimes continuously. This saves time and money, ensures reproducibility, and ultimately leads to software that is better and cheaper. Automated tests must include code to check that the output of the program on the test matches expected behavior. This code is called the test oracle and is typically implemented in assertions that flag the test as passing if the assertion evaluates to true and failing if not. Since automated tests require programming, many problems can occur. Some lead to false positives, where incorrect behavior is marked as correct, and others to false negatives, where correct behavior is marked as incorrect. This paper identifies and studies a common problem where test assertions are written incorrectly, leading to incorrect behavior that is not recognized. We call these tests blind because the test does not see the incorrect behavior. Blind tests cause false positives, essentially wasting the tests. This paper presents results from several human-based studies to assess the frequency of blind tests with different software and different populations of users. In our studies, the percent of blind tests ranged from a low of 39% to a high of 95%.","PeriodicalId":235967,"journal":{"name":"2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icst46399.2020.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Modern software engineers automate as many tests as possible. Test automation allows tests to be run hundreds or thousands of times: hourly, daily, and sometimes continuously. This saves time and money, ensures reproducibility, and ultimately leads to software that is better and cheaper. Automated tests must include code to check that the output of the program on the test matches expected behavior. This code is called the test oracle and is typically implemented in assertions that flag the test as passing if the assertion evaluates to true and failing if not. Since automated tests require programming, many problems can occur. Some lead to false positives, where incorrect behavior is marked as correct, and others to false negatives, where correct behavior is marked as incorrect. This paper identifies and studies a common problem where test assertions are written incorrectly, leading to incorrect behavior that is not recognized. We call these tests blind because the test does not see the incorrect behavior. Blind tests cause false positives, essentially wasting the tests. This paper presents results from several human-based studies to assess the frequency of blind tests with different software and different populations of users. In our studies, the percent of blind tests ranged from a low of 39% to a high of 95%.

查看原文本刊更多论文

盲测的实证分析

现代软件工程师将尽可能多的测试自动化。测试自动化允许测试运行数百或数千次:每小时、每天，有时连续运行。这节省了时间和金钱，确保了再现性，并最终导致软件更好、更便宜。自动化测试必须包括检查测试程序的输出是否符合预期行为的代码。这段代码被称为test oracle，通常在断言中实现，如果断言的计算结果为true，则将测试标记为通过，如果不是，则标记为失败。由于自动化测试需要编程，因此可能会出现许多问题。有些会导致误报，不正确的行为被标记为正确，有些会导致误报，正确的行为被标记为不正确。本文确定并研究了一个常见的问题，即错误地编写测试断言，导致无法识别的不正确行为。我们称这些测试为盲测试，因为测试看不到错误的行为。盲测会导致误报，基本上浪费了测试时间。本文介绍了几项基于人类的研究结果，以评估不同软件和不同用户群体的盲测频率。在我们的研究中，盲测的百分比从低至39%到高至95%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST)

自引率

0.00%

发文量