{"title":"利用人工智能为手写热力学考试评分:一项探索性研究","authors":"Gerd Kortemeyer, Julian Nöhl, Daria Onishchuk","doi":"arxiv-2406.17859","DOIUrl":null,"url":null,"abstract":"Using a high-stakes thermodynamics exam as sample (252~students, four\nmultipart problems), we investigate the viability of four workflows for\nAI-assisted grading of handwritten student solutions. We find that the greatest\nchallenge lies in converting handwritten answers into a machine-readable\nformat. The granularity of grading criteria also influences grading\nperformance: employing a fine-grained rubric for entire problems often leads to\nbookkeeping errors and grading failures, while grading problems in parts is\nmore reliable but tends to miss nuances. We also found that grading hand-drawn\ngraphics, such as process diagrams, is less reliable than mathematical\nderivations due to the difficulty in differentiating essential details from\nextraneous information. Although the system is precise in identifying exams\nthat meet passing criteria, exams with failing grades still require human\ngrading. We conclude with recommendations to overcome some of the encountered\nchallenges.","PeriodicalId":501565,"journal":{"name":"arXiv - PHYS - Physics Education","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Grading Assistance for a Handwritten Thermodynamics Exam using Artificial Intelligence: An Exploratory Study\",\"authors\":\"Gerd Kortemeyer, Julian Nöhl, Daria Onishchuk\",\"doi\":\"arxiv-2406.17859\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Using a high-stakes thermodynamics exam as sample (252~students, four\\nmultipart problems), we investigate the viability of four workflows for\\nAI-assisted grading of handwritten student solutions. We find that the greatest\\nchallenge lies in converting handwritten answers into a machine-readable\\nformat. The granularity of grading criteria also influences grading\\nperformance: employing a fine-grained rubric for entire problems often leads to\\nbookkeeping errors and grading failures, while grading problems in parts is\\nmore reliable but tends to miss nuances. We also found that grading hand-drawn\\ngraphics, such as process diagrams, is less reliable than mathematical\\nderivations due to the difficulty in differentiating essential details from\\nextraneous information. Although the system is precise in identifying exams\\nthat meet passing criteria, exams with failing grades still require human\\ngrading. We conclude with recommendations to overcome some of the encountered\\nchallenges.\",\"PeriodicalId\":501565,\"journal\":{\"name\":\"arXiv - PHYS - Physics Education\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Physics Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.17859\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Physics Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.17859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Grading Assistance for a Handwritten Thermodynamics Exam using Artificial Intelligence: An Exploratory Study
Using a high-stakes thermodynamics exam as sample (252~students, four
multipart problems), we investigate the viability of four workflows for
AI-assisted grading of handwritten student solutions. We find that the greatest
challenge lies in converting handwritten answers into a machine-readable
format. The granularity of grading criteria also influences grading
performance: employing a fine-grained rubric for entire problems often leads to
bookkeeping errors and grading failures, while grading problems in parts is
more reliable but tends to miss nuances. We also found that grading hand-drawn
graphics, such as process diagrams, is less reliable than mathematical
derivations due to the difficulty in differentiating essential details from
extraneous information. Although the system is precise in identifying exams
that meet passing criteria, exams with failing grades still require human
grading. We conclude with recommendations to overcome some of the encountered
challenges.