完美是测试oracle的敌人

软件产业与工程 Pub Date : 2022-11-07 DOI:10.1145/3540250.3549086

Ali Reza Ibrahimzada, Yigit Varli, Dilara Tekinoglu, Reyhaneh Jabbarvand

{"title":"完美是测试oracle的敌人","authors":"Ali Reza Ibrahimzada, Yigit Varli, Dilara Tekinoglu, Reyhaneh Jabbarvand","doi":"10.1145/3540250.3549086","DOIUrl":null,"url":null,"abstract":"Automation of test oracles is one of the most challenging facets of software testing, but remains comparatively less addressed compared to automated test input generation. Test oracles rely on a ground-truth that can distinguish between the correct and buggy behavior to determine whether a test fails (detects a bug) or passes. What makes the oracle problem challenging and undecidable is the assumption that the ground-truth should know the exact expected, correct, or buggy behavior. However, we argue that one can still build an accurate oracle without knowing the exact correct or buggy behavior, but how these two might differ. This paper presents , a learning-based approach that in the absence of test assertions or other types of oracle, can determine whether a unit test passes or fails on a given method under test (MUT). To build the ground-truth, jointly embeds unit tests and the implementation of MUTs into a unified vector space, in such a way that the neural representation of tests are similar to that of MUTs they pass on them, but dissimilar to MUTs they fail on them. The classifier built on top of this vector representation serves as the oracle to generate “fail” labels, when test inputs detect a bug in MUT or “pass” labels, otherwise. Our extensive experiments on applying to more than 5K unit tests from a diverse set of open-source Java projects show that the produced oracle is (1) effective in predicting the fail or pass labels, achieving an overall accuracy, precision, recall, and F1 measure of 93%, 86%, 94%, and 90%, (2) generalizable, predicting the labels for the unit test of projects that were not in training or validation set with negligible performance drop, and (3) efficient, detecting the existence of bugs in only 6.5 milliseconds on average. Moreover, by interpreting the neural model and looking at it beyond a closed-box solution, we confirm that the oracle is valid, i.e., it predicts the labels through learning relevant features.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Perfect is the enemy of test oracle\",\"authors\":\"Ali Reza Ibrahimzada, Yigit Varli, Dilara Tekinoglu, Reyhaneh Jabbarvand\",\"doi\":\"10.1145/3540250.3549086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automation of test oracles is one of the most challenging facets of software testing, but remains comparatively less addressed compared to automated test input generation. Test oracles rely on a ground-truth that can distinguish between the correct and buggy behavior to determine whether a test fails (detects a bug) or passes. What makes the oracle problem challenging and undecidable is the assumption that the ground-truth should know the exact expected, correct, or buggy behavior. However, we argue that one can still build an accurate oracle without knowing the exact correct or buggy behavior, but how these two might differ. This paper presents , a learning-based approach that in the absence of test assertions or other types of oracle, can determine whether a unit test passes or fails on a given method under test (MUT). To build the ground-truth, jointly embeds unit tests and the implementation of MUTs into a unified vector space, in such a way that the neural representation of tests are similar to that of MUTs they pass on them, but dissimilar to MUTs they fail on them. The classifier built on top of this vector representation serves as the oracle to generate “fail” labels, when test inputs detect a bug in MUT or “pass” labels, otherwise. Our extensive experiments on applying to more than 5K unit tests from a diverse set of open-source Java projects show that the produced oracle is (1) effective in predicting the fail or pass labels, achieving an overall accuracy, precision, recall, and F1 measure of 93%, 86%, 94%, and 90%, (2) generalizable, predicting the labels for the unit test of projects that were not in training or validation set with negligible performance drop, and (3) efficient, detecting the existence of bugs in only 6.5 milliseconds on average. Moreover, by interpreting the neural model and looking at it beyond a closed-box solution, we confirm that the oracle is valid, i.e., it predicts the labels through learning relevant features.\",\"PeriodicalId\":68155,\"journal\":{\"name\":\"软件产业与工程\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"软件产业与工程\",\"FirstCategoryId\":\"1089\",\"ListUrlMain\":\"https://doi.org/10.1145/3540250.3549086\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3549086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

测试预言机的自动化是软件测试中最具挑战性的方面之一，但与自动化测试输入生成相比，它仍然相对较少得到解决。测试预言机依赖于能够区分正确和错误行为的基本事实，以确定测试是失败(检测到错误)还是通过。使oracle问题具有挑战性和不可判定性的是这样一个假设，即基本事实应该知道确切的预期、正确或有bug的行为。然而，我们认为，在不知道确切的正确或错误行为的情况下，仍然可以构建一个准确的oracle，但这两者可能有何不同。本文提出了一种基于学习的方法，在没有测试断言或其他类型的oracle的情况下，可以确定单元测试在给定的测试方法(MUT)上是通过还是失败。为了构建基本事实，将单元测试和mut的实现联合嵌入到统一的向量空间中，以这样一种方式，测试的神经表示与它们传递的mut的神经表示相似，但与它们失败的mut不同。当测试输入检测到MUT中的错误或“通过”标签时，构建在此向量表示之上的分类器作为oracle生成“失败”标签，否则。我们对来自各种开源Java项目的超过5K个单元测试进行了广泛的实验，结果表明，生成的oracle(1)在预测失败或通过标签方面是有效的，实现了93%、86%、94%和90%的总体准确性、精密度、召回率和F1度量，(2)可泛化，预测了不在训练或验证集中的项目的单元测试的标签，性能下降可以忽略不计，(3)高效。平均仅在6.5毫秒内检测到bug的存在。此外，通过解释神经模型并超越封闭盒解决方案来看待它，我们确认oracle是有效的，即它通过学习相关特征来预测标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Perfect is the enemy of test oracle

Automation of test oracles is one of the most challenging facets of software testing, but remains comparatively less addressed compared to automated test input generation. Test oracles rely on a ground-truth that can distinguish between the correct and buggy behavior to determine whether a test fails (detects a bug) or passes. What makes the oracle problem challenging and undecidable is the assumption that the ground-truth should know the exact expected, correct, or buggy behavior. However, we argue that one can still build an accurate oracle without knowing the exact correct or buggy behavior, but how these two might differ. This paper presents , a learning-based approach that in the absence of test assertions or other types of oracle, can determine whether a unit test passes or fails on a given method under test (MUT). To build the ground-truth, jointly embeds unit tests and the implementation of MUTs into a unified vector space, in such a way that the neural representation of tests are similar to that of MUTs they pass on them, but dissimilar to MUTs they fail on them. The classifier built on top of this vector representation serves as the oracle to generate “fail” labels, when test inputs detect a bug in MUT or “pass” labels, otherwise. Our extensive experiments on applying to more than 5K unit tests from a diverse set of open-source Java projects show that the produced oracle is (1) effective in predicting the fail or pass labels, achieving an overall accuracy, precision, recall, and F1 measure of 93%, 86%, 94%, and 90%, (2) generalizable, predicting the labels for the unit test of projects that were not in training or validation set with negligible performance drop, and (3) efficient, detecting the existence of bugs in only 6.5 milliseconds on average. Moreover, by interpreting the neural model and looking at it beyond a closed-box solution, we confirm that the oracle is valid, i.e., it predicts the labels through learning relevant features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

软件产业与工程

自引率

0.00%

发文量

676