{"title":"Comparing estimates of difficulty of programming constructs","authors":"M. Bastian, A. Mühling","doi":"10.1145/3564721.3565950","DOIUrl":null,"url":null,"abstract":"Designing assessments in classroom contexts or having them generated automatically requires - among other things - knowledge about the difficulty of what is assessed. Estimates of difficulty can be derived empirically, usually by piloting items, or theoretically from models. Empirical results, in turn, can inform theory and refine models. In this article, we compare four methods of estimating the item difficulty for a typical topic of introductory programming courses: control flow. For a given set of items that have been tested empirically, we also collected expert ratings and additionally applied measures of code complexity both from software engineering and from computer science education research The results show that there is some overlap between empirical results and theoretical predictions. However, for the simple item format that we have been using, the models all fall short in offering enough explanatory power regarding the observed variance in difficulty. Empirical difficulty in turn can serve as the basis for rules that can be used for item generation in the future.","PeriodicalId":149708,"journal":{"name":"Proceedings of the 22nd Koli Calling International Conference on Computing Education Research","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd Koli Calling International Conference on Computing Education Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564721.3565950","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Designing assessments in classroom contexts or having them generated automatically requires - among other things - knowledge about the difficulty of what is assessed. Estimates of difficulty can be derived empirically, usually by piloting items, or theoretically from models. Empirical results, in turn, can inform theory and refine models. In this article, we compare four methods of estimating the item difficulty for a typical topic of introductory programming courses: control flow. For a given set of items that have been tested empirically, we also collected expert ratings and additionally applied measures of code complexity both from software engineering and from computer science education research The results show that there is some overlap between empirical results and theoretical predictions. However, for the simple item format that we have been using, the models all fall short in offering enough explanatory power regarding the observed variance in difficulty. Empirical difficulty in turn can serve as the basis for rules that can be used for item generation in the future.