STEM 考试成绩：大语言模型时代的开卷与闭卷考试方法。

IF 1.4 Q4 MEDICINE, RESEARCH & EXPERIMENTAL

Clinical Teacher Pub Date : 2024-11-04 DOI:10.1111/tct.13839

Rasi Mizori, Muhayman Sadiq, Malik Takreem Ahmad, Anthony Siu, Reubeen Rashid Ahmad, Zijing Yang, Helen Oram, James Galloway

{"title":"STEM 考试成绩：大语言模型时代的开卷与闭卷考试方法。","authors":"Rasi Mizori, Muhayman Sadiq, Malik Takreem Ahmad, Anthony Siu, Reubeen Rashid Ahmad, Zijing Yang, Helen Oram, James Galloway","doi":"10.1111/tct.13839","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>The COVID-19 pandemic accelerated the shift to remote learning, heightening scrutiny of open-book examinations (OBEs) versus closed-book examinations (CBEs) within science, technology, engineering, arts and mathematics (STEM) education. This study evaluates the efficacy of OBEs compared to CBEs on student performance and perceptions within STEM subjects, considering the emerging influence of sophisticated large language models (LLMs) such as GPT-3.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Adhering to PRISMA guidelines, this systematic review analysed peer-reviewed articles published from 2013, focusing on the impact of OBEs and CBEs on university STEM students. Standardised mean differences were assessed using a random effects model, with heterogeneity evaluated by <i>I</i><sup>2</sup> statistics, Cochrane's <i>Q</i> test and Tau statistics.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Analysis of eight studies revealed mixed outcomes. Meta-analysis showed that OBEs generally resulted in better scores than CBEs, despite significant heterogeneity (<i>I</i><sup>2</sup> = 97%). Observational studies displayed more pronounced effects, with noted concerns over technical difficulties and instances of cheating.</p>\n </section>\n \n <section>\n \n <h3> Discussion</h3>\n \n <p>Results suggest that OBEs assess competencies more aligned with current educational paradigms than CBEs. However, the emergence of LLMs poses new challenges to OBE validity by simplifying the generation of comprehensive answers, impacting academic integrity and examination fairness.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>While OBEs are better suited to contemporary educational needs, the influence of LLMs on their effectiveness necessitates further study. Institutions should prudently consider the competencies assessed by OBEs, particularly in light of evolving technological landscapes. Future research should explore the integrity of OBEs in the presence of LLMs to ensure fair and effective student evaluations.</p>\n </section>\n </div>","PeriodicalId":47324,"journal":{"name":"Clinical Teacher","volume":"22 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11663729/pdf/","citationCount":"0","resultStr":"{\"title\":\"STEM exam performance: Open- versus closed-book methods in the large language model era\",\"authors\":\"Rasi Mizori, Muhayman Sadiq, Malik Takreem Ahmad, Anthony Siu, Reubeen Rashid Ahmad, Zijing Yang, Helen Oram, James Galloway\",\"doi\":\"10.1111/tct.13839\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>The COVID-19 pandemic accelerated the shift to remote learning, heightening scrutiny of open-book examinations (OBEs) versus closed-book examinations (CBEs) within science, technology, engineering, arts and mathematics (STEM) education. This study evaluates the efficacy of OBEs compared to CBEs on student performance and perceptions within STEM subjects, considering the emerging influence of sophisticated large language models (LLMs) such as GPT-3.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Adhering to PRISMA guidelines, this systematic review analysed peer-reviewed articles published from 2013, focusing on the impact of OBEs and CBEs on university STEM students. Standardised mean differences were assessed using a random effects model, with heterogeneity evaluated by <i>I</i><sup>2</sup> statistics, Cochrane's <i>Q</i> test and Tau statistics.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Analysis of eight studies revealed mixed outcomes. Meta-analysis showed that OBEs generally resulted in better scores than CBEs, despite significant heterogeneity (<i>I</i><sup>2</sup> = 97%). Observational studies displayed more pronounced effects, with noted concerns over technical difficulties and instances of cheating.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Discussion</h3>\\n \\n <p>Results suggest that OBEs assess competencies more aligned with current educational paradigms than CBEs. However, the emergence of LLMs poses new challenges to OBE validity by simplifying the generation of comprehensive answers, impacting academic integrity and examination fairness.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>While OBEs are better suited to contemporary educational needs, the influence of LLMs on their effectiveness necessitates further study. Institutions should prudently consider the competencies assessed by OBEs, particularly in light of evolving technological landscapes. Future research should explore the integrity of OBEs in the presence of LLMs to ensure fair and effective student evaluations.</p>\\n </section>\\n </div>\",\"PeriodicalId\":47324,\"journal\":{\"name\":\"Clinical Teacher\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11663729/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Teacher\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/tct.13839\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Teacher","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/tct.13839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

背景：COVID-19 大流行加速了向远程学习的转变，加强了对科学、技术、工程、艺术和数学（STEM）教育中开卷考试（OBE）与闭卷考试（CBE）的审查。考虑到 GPT-3 等复杂的大型语言模型（LLM）的新兴影响，本研究评估了开卷考试与闭卷考试对 STEM 学科学生成绩和认知的影响：本系统性综述遵循 PRISMA 指南，分析了 2013 年以来发表的同行评议文章，重点关注开放式教育局和社区教育对 STEM 大学生的影响。采用随机效应模型评估标准化均值差异，并通过I2统计量、Cochrane's Q检验和Tau统计量评估异质性：结果：对八项研究的分析结果不一。元分析表明，尽管存在显著的异质性（I2 = 97%），但开放式教育局的得分普遍高于社区教育。观察性研究显示了更明显的效果，但也注意到了技术困难和作弊情况：讨论：研究结果表明，开放式教育所评估的能力比社区教育更符合当前的教育模式。然而，法律硕士的出现简化了综合答案的生成，影响了学术诚信和考试公平，从而对开放式教育局的有效性提出了新的挑战：尽管开放式教育局更适合当代教育需求，但有必要进一步研究法律硕士对其有效性的影响。各院校应审慎考虑开放式教育局所评估的能力，尤其是在技术不断发展的情况下。未来的研究应探讨在存在法律硕士的情况下，开放式教学法的完整性，以确保公平有效的学生评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

STEM exam performance: Open- versus closed-book methods in the large language model era

查看原文本刊更多论文

STEM exam performance: Open- versus closed-book methods in the large language model era

Background

The COVID-19 pandemic accelerated the shift to remote learning, heightening scrutiny of open-book examinations (OBEs) versus closed-book examinations (CBEs) within science, technology, engineering, arts and mathematics (STEM) education. This study evaluates the efficacy of OBEs compared to CBEs on student performance and perceptions within STEM subjects, considering the emerging influence of sophisticated large language models (LLMs) such as GPT-3.

Methods

Adhering to PRISMA guidelines, this systematic review analysed peer-reviewed articles published from 2013, focusing on the impact of OBEs and CBEs on university STEM students. Standardised mean differences were assessed using a random effects model, with heterogeneity evaluated by I² statistics, Cochrane's Q test and Tau statistics.

Results

Analysis of eight studies revealed mixed outcomes. Meta-analysis showed that OBEs generally resulted in better scores than CBEs, despite significant heterogeneity (I² = 97%). Observational studies displayed more pronounced effects, with noted concerns over technical difficulties and instances of cheating.

Discussion

Results suggest that OBEs assess competencies more aligned with current educational paradigms than CBEs. However, the emergence of LLMs poses new challenges to OBE validity by simplifying the generation of comprehensive answers, impacting academic integrity and examination fairness.

Conclusions

While OBEs are better suited to contemporary educational needs, the influence of LLMs on their effectiveness necessitates further study. Institutions should prudently consider the competencies assessed by OBEs, particularly in light of evolving technological landscapes. Future research should explore the integrity of OBEs in the presence of LLMs to ensure fair and effective student evaluations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical Teacher MEDICINE, RESEARCH & EXPERIMENTAL-

CiteScore

2.90

自引率

5.60%

发文量

113

期刊介绍： The Clinical Teacher has been designed with the active, practising clinician in mind. It aims to provide a digest of current research, practice and thinking in medical education presented in a readable, stimulating and practical style. The journal includes sections for reviews of the literature relating to clinical teaching bringing authoritative views on the latest thinking about modern teaching. There are also sections on specific teaching approaches, a digest of the latest research published in Medical Education and other teaching journals, reports of initiatives and advances in thinking and practical teaching from around the world, and expert community and discussion on challenging and controversial issues in today"s clinical education.