Do student programmers all tend to write the same software tests?

Annual Conference on Innovation and Technology in Computer Science Education Pub Date : 2014-06-21 DOI:10.1145/2591708.2591757

S. Edwards, Z. Shams

{"title":"Do student programmers all tend to write the same software tests?","authors":"S. Edwards, Z. Shams","doi":"10.1145/2591708.2591757","DOIUrl":null,"url":null,"abstract":"While many educators have added software testing practices to their programming assignments, assessing the effectiveness of student-written tests using statement coverage or branch coverage has limitations. While researchers have begun investigating alternative approaches to assessing student-written tests, this paper reports on an investigation of the quality of student written tests in terms of the number of authentic, human-written defects those tests can detect. An experiment was conducted using 101 programs written for a CS2 data structures assignment where students implemented a queue two ways, using both an array-based and a link-based representation. Students were required to write their own software tests and graded in part on the branch coverage they achieved. Using techniques from prior work, we were able to approximate the number of bugs present in the collection of student solutions, and identify which of these were detected by each student-written test suite. The results indicate that, while students achieved an average branch coverage of 95.4% on their own solutions, their test suites were only able to detect an average of 13.6% of the faults present in the entire program population. Further, there was a high degree of similarity among 90% of the student test suites. Analysis of the suites suggest that students were following naïve, \"happy path\" testing, writing basic test cases covering mainstream expected behavior rather than writing tests designed to detect hidden bugs. These results suggest that educators should strive to reinforce test design techniques intended to find bugs, rather than simply confirming that features work as expected.","PeriodicalId":334476,"journal":{"name":"Annual Conference on Innovation and Technology in Computer Science Education","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Conference on Innovation and Technology in Computer Science Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2591708.2591757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

While many educators have added software testing practices to their programming assignments, assessing the effectiveness of student-written tests using statement coverage or branch coverage has limitations. While researchers have begun investigating alternative approaches to assessing student-written tests, this paper reports on an investigation of the quality of student written tests in terms of the number of authentic, human-written defects those tests can detect. An experiment was conducted using 101 programs written for a CS2 data structures assignment where students implemented a queue two ways, using both an array-based and a link-based representation. Students were required to write their own software tests and graded in part on the branch coverage they achieved. Using techniques from prior work, we were able to approximate the number of bugs present in the collection of student solutions, and identify which of these were detected by each student-written test suite. The results indicate that, while students achieved an average branch coverage of 95.4% on their own solutions, their test suites were only able to detect an average of 13.6% of the faults present in the entire program population. Further, there was a high degree of similarity among 90% of the student test suites. Analysis of the suites suggest that students were following naïve, "happy path" testing, writing basic test cases covering mainstream expected behavior rather than writing tests designed to detect hidden bugs. These results suggest that educators should strive to reinforce test design techniques intended to find bugs, rather than simply confirming that features work as expected.

查看原文本刊更多论文

学生程序员是否都倾向于编写相同的软件测试?

虽然许多教育工作者已经将软件测试实践添加到他们的编程作业中，但是使用语句覆盖或分支覆盖来评估学生编写的测试的有效性具有局限性。虽然研究人员已经开始研究评估学生笔试的替代方法，但本文报告了一项关于学生笔试质量的调查，即这些测试可以检测到的真实的、人为编写的缺陷的数量。我们使用为CS2数据结构作业编写的101个程序进行了一项实验，其中学生以两种方式实现队列，分别使用基于数组和基于链接的表示。学生们被要求编写他们自己的软件测试，并根据他们实现的分支覆盖率进行部分评分。使用先前工作中的技术，我们能够估计学生解决方案集合中存在的错误数量，并确定每个学生编写的测试套件检测到哪些错误。结果表明，当学生在他们自己的解决方案上达到95.4%的平均分支覆盖率时，他们的测试套件只能检测整个程序中出现的平均13.6%的错误。此外，在90%的学生测试套件中存在高度的相似性。对套件的分析表明，学生们遵循naïve，“快乐路径”测试，编写涵盖主流预期行为的基本测试用例，而不是编写旨在检测隐藏错误的测试。这些结果表明，教育者应该努力加强旨在发现bug的测试设计技术，而不是简单地确认功能按预期工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annual Conference on Innovation and Technology in Computer Science Education

自引率

0.00%

发文量