Split-Paper Testing: A Novel Approach to Evaluate Programming Performance

J. Inf. Process. Pub Date : 2020-10-29 DOI:10.2197/ipsjjip.28.733

Yasuichi Nakayama, Y. Kuno, Hiroyasu Kakuda

{"title":"Split-Paper Testing: A Novel Approach to Evaluate Programming Performance","authors":"Yasuichi Nakayama, Y. Kuno, Hiroyasu Kakuda","doi":"10.2197/ipsjjip.28.733","DOIUrl":null,"url":null,"abstract":": There is a great need to evaluate and / or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is di ﬃ cult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shu ﬄ es the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have a ﬃ nity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically signiﬁcant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.","PeriodicalId":430763,"journal":{"name":"J. Inf. Process.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Inf. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjjip.28.733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

: There is a great need to evaluate and / or test programming performance. For this purpose, two schemes have been used. Constructed response (CR) tests let the examinee write programs on a blank sheet (or with a computer keyboard). This scheme can evaluate the programming performance. However, it is di ﬃ cult to apply in a large volume because skilled human graders are required (automatic evaluation is attempted but not widely used yet). Multiple choice (MC) tests let the examinee choose the correct answer from a list (often corresponding to the “hidden” portion of a complete program). This scheme can be used in a large volume with computer-based testing or mark-sense cards. However, many teachers and researchers are suspicious in that a good score does not necessarily mean the ability to write programs from scratch. We propose a third method, split-paper (SP) testing. Our scheme splits a correct program into each of its lines, shu ﬄ es the lines, adds “wrong answer” lines, and prepends them with choice symbols. The examinee answers by using a list of choice symbols corresponding to the correct program, which can be easily graded automatically by using computers. In particular, we propose the use of edit distance (Levenshtein distance) in the scoring scheme, which seems to have a ﬃ nity with the SP scheme. The research question is whether SP tests scored by using an edit-distance-based scoring scheme measure programming performance as do CR tests. Therefore, we conducted an experiment by using college programming classes with 60 students to compare SP tests against CR tests. As a result, SP and CR test scores are correlated for multiple settings, and the results were statistically signiﬁcant. Therefore, we might conclude that SP tests with automatic scoring using edit distance are useful tools for evaluating the programming performance.

查看原文本刊更多论文

拆纸测试:一种评估编程性能的新方法

评估和/或测试编程性能是非常必要的。为此，使用了两种方案。建构反应(CR)测试让考生在一张空白纸上(或用电脑键盘)编写程序。该方案可以对编程性能进行评价。然而，由于需要熟练的人工评分，因此很难大量应用(尝试自动评估，但尚未广泛使用)。多项选择题(MC)测试让考生从一个列表(通常对应于完整程序的“隐藏”部分)中选择正确答案。该方案可以在基于计算机的测试或标记感知卡中大量使用。然而，许多教师和研究人员怀疑，一个好的分数并不一定意味着能够从头开始编写程序。我们提出了第三种方法，分纸(SP)测试。我们的方案将一个正确的程序分成它的每一行，将这些行删除，添加“错误答案”行，并在它们前面加上选择符号。考生使用与正确程序相对应的选择符号列表来回答问题，这些选择符号可以很容易地通过计算机自动评分。特别地，我们提出在评分方案中使用编辑距离(Levenshtein距离)，这似乎与SP方案有一定的联系。研究的问题是，SP测试是否使用基于编辑距离的评分方案来衡量编程性能，就像CR测试一样。因此，我们利用60名学生的大学编程课进行了一项实验，比较SP测试和CR测试。因此，SP和CR测试分数在多个设置下是相关的，结果具有统计学意义。因此，我们可以得出结论，使用编辑距离自动评分的SP测试是评估编程性能的有用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Inf. Process.

自引率

0.00%

发文量