Milan Kostic, Hans Friedrich Witschel, Knut Hinkelmann, Maja Spahic-Bogdanovic
{"title":"自动论文评估法学硕士:案例研究","authors":"Milan Kostic, Hans Friedrich Witschel, Knut Hinkelmann, Maja Spahic-Bogdanovic","doi":"10.1609/aaaiss.v3i1.31193","DOIUrl":null,"url":null,"abstract":"This study delves into the application of large language models (LLMs), such as ChatGPT-4, for the automated evaluation of student essays, with a focus on a case study conducted at the Swiss Institute of Business Administration. It explores the effectiveness of LLMs in assessing German-language student transfer assignments, and contrasts their performance with traditional evaluations by human lecturers. The primary findings highlight the challenges faced by LLMs in terms of accurately grading complex texts according to predefined categories and providing detailed feedback. This research illuminates the gap between the capabilities of LLMs and the nuanced requirements of student essay evaluation. The conclusion emphasizes the necessity for ongoing research and development in the area of LLM technology to improve the accuracy, reliability, and consistency of automated essay assessments in educational contexts.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLMs in Automated Essay Evaluation: A Case Study\",\"authors\":\"Milan Kostic, Hans Friedrich Witschel, Knut Hinkelmann, Maja Spahic-Bogdanovic\",\"doi\":\"10.1609/aaaiss.v3i1.31193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study delves into the application of large language models (LLMs), such as ChatGPT-4, for the automated evaluation of student essays, with a focus on a case study conducted at the Swiss Institute of Business Administration. It explores the effectiveness of LLMs in assessing German-language student transfer assignments, and contrasts their performance with traditional evaluations by human lecturers. The primary findings highlight the challenges faced by LLMs in terms of accurately grading complex texts according to predefined categories and providing detailed feedback. This research illuminates the gap between the capabilities of LLMs and the nuanced requirements of student essay evaluation. The conclusion emphasizes the necessity for ongoing research and development in the area of LLM technology to improve the accuracy, reliability, and consistency of automated essay assessments in educational contexts.\",\"PeriodicalId\":516827,\"journal\":{\"name\":\"Proceedings of the AAAI Symposium Series\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the AAAI Symposium Series\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/aaaiss.v3i1.31193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Symposium Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaaiss.v3i1.31193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This study delves into the application of large language models (LLMs), such as ChatGPT-4, for the automated evaluation of student essays, with a focus on a case study conducted at the Swiss Institute of Business Administration. It explores the effectiveness of LLMs in assessing German-language student transfer assignments, and contrasts their performance with traditional evaluations by human lecturers. The primary findings highlight the challenges faced by LLMs in terms of accurately grading complex texts according to predefined categories and providing detailed feedback. This research illuminates the gap between the capabilities of LLMs and the nuanced requirements of student essay evaluation. The conclusion emphasizes the necessity for ongoing research and development in the area of LLM technology to improve the accuracy, reliability, and consistency of automated essay assessments in educational contexts.