{"title":"Can large language models meet the challenge of generating school-level questions?","authors":"Subhankar Maity, Aniket Deroy, Sudeshna Sarkar","doi":"10.1016/j.caeai.2025.100370","DOIUrl":null,"url":null,"abstract":"<div><div>In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their <em>coverage</em>, <em>grammaticality</em>, <em>usefulness</em>, <em>answerability</em>, and <em>relevance</em>. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy.</div></div>","PeriodicalId":34469,"journal":{"name":"Computers and Education Artificial Intelligence","volume":"8 ","pages":"Article 100370"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Education Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666920X25000104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their coverage, grammaticality, usefulness, answerability, and relevance. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy.