Taking Advantage of Scale by Analyzing Frequent Constructed-Response, Code Tracing Wrong Answers

Proceedings of the 2017 ACM Conference on International Computing Education Research Pub Date : 2017-08-14 DOI:10.1145/3105726.3106188

Kristin Stephens-Martinez, An Ju, Krishna Parashar, Regina Ongowarsito, Nikunj Jain, Sreesha Venkat, A. Fox

{"title":"Taking Advantage of Scale by Analyzing Frequent Constructed-Response, Code Tracing Wrong Answers","authors":"Kristin Stephens-Martinez, An Ju, Krishna Parashar, Regina Ongowarsito, Nikunj Jain, Sreesha Venkat, A. Fox","doi":"10.1145/3105726.3106188","DOIUrl":null,"url":null,"abstract":"Constructed-response, code-tracing questions (\"What would Python print?\") are good formative assessments. Unlike selected-response questions simply marked correct or incorrect, a constructed wrong answer can provide information on a student's particular difficulty. However, constructed-response questions are resource-intensive to grade manually, and machine grading yields only correct/incorrect information. We analyzed incorrect constructed responses from code-tracing questions in an introductory computer science course to investigate whether a small subsample of such responses could provide enough information to make inspecting the subsample worth the effort, and if so, how best to choose this subsample. In addition, we sought to understand what insights into student difficulties could be gained from such an analysis. We found that ~5% of the most frequently given wrong answers cover ~60% of the wrong constructed responses. Inspecting these wrong answers, we found similar misconceptions as those in prior work, additional difficulties not identified in prior work regarding language-specific constructs and data structures, and non-misconception \"slips\" that cause students to get questions wrong, such as syntax errors, sloppy reading/writing. Our methodology is much less time-consuming than full manual inspection, yet yields new and durable insight into student difficulties that can be used for several purposes, including expanding a concept inventory, creating summative assessments, and creating effective distractors for selected-response assessments.","PeriodicalId":267640,"journal":{"name":"Proceedings of the 2017 ACM Conference on International Computing Education Research","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM Conference on International Computing Education Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3105726.3106188","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

Abstract

Constructed-response, code-tracing questions ("What would Python print?") are good formative assessments. Unlike selected-response questions simply marked correct or incorrect, a constructed wrong answer can provide information on a student's particular difficulty. However, constructed-response questions are resource-intensive to grade manually, and machine grading yields only correct/incorrect information. We analyzed incorrect constructed responses from code-tracing questions in an introductory computer science course to investigate whether a small subsample of such responses could provide enough information to make inspecting the subsample worth the effort, and if so, how best to choose this subsample. In addition, we sought to understand what insights into student difficulties could be gained from such an analysis. We found that ~5% of the most frequently given wrong answers cover ~60% of the wrong constructed responses. Inspecting these wrong answers, we found similar misconceptions as those in prior work, additional difficulties not identified in prior work regarding language-specific constructs and data structures, and non-misconception "slips" that cause students to get questions wrong, such as syntax errors, sloppy reading/writing. Our methodology is much less time-consuming than full manual inspection, yet yields new and durable insight into student difficulties that can be used for several purposes, including expanding a concept inventory, creating summative assessments, and creating effective distractors for selected-response assessments.

查看原文本刊更多论文

分析频繁构造响应，利用规模优势，代码跟踪错误答案

构造响应、代码跟踪问题(“Python会打印什么?”)是很好的形成性评估。不像选择回答的问题只是简单地标记正确或不正确，构造错误的答案可以提供学生特定困难的信息。然而，人工评分是资源密集型的，机器评分只能产生正确/错误的信息。在计算机科学入门课程中，我们分析了代码跟踪问题中不正确构造的回答，以调查此类回答的一个小子样本是否可以提供足够的信息，使检查子样本值得付出努力，如果是这样，如何最好地选择这个子样本。此外，我们试图了解从这样的分析中可以获得对学生困难的见解。我们发现，约5%的最常给出的错误答案覆盖了约60%的错误构造答案。在检查这些错误答案时，我们发现了与之前工作中类似的误解，先前工作中未发现的关于特定语言结构和数据结构的额外困难，以及导致学生出错的非误解“失误”，例如语法错误，草率的阅读/写作。我们的方法比完全手工检查要省时得多，但对学生的困难产生了新的和持久的见解，可以用于几个目的，包括扩展概念清单，创建总结性评估，以及为选择响应评估创建有效的干扰。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM Conference on International Computing Education Research

自引率

0.00%

发文量