Eileen Richter , Markus Wolfgang Hermann Spitzer , Annabelle Morgan , Luisa Frede , Joshua Weidlich , Korbinian Moeller
{"title":"Large language models outperform humans in identifying neuromyths but show sycophantic behavior in applied contexts","authors":"Eileen Richter , Markus Wolfgang Hermann Spitzer , Annabelle Morgan , Luisa Frede , Joshua Weidlich , Korbinian Moeller","doi":"10.1016/j.tine.2025.100255","DOIUrl":null,"url":null,"abstract":"<div><div><strong>Background</strong>: Neuromyths are widespread among educators, which raises concerns about misconceptions regarding the (neural) principles underlying learning in the educator population. With the increasing use of large language models (LLMs) in education, educators are increasingly relying on these for lesson planning and professional development. Therefore, if LLMs correctly identify neuromyths, they may help to dispute related misconceptions.</div><div><strong>Method</strong>: We evaluated whether LLMs can correctly identify neuromyths and whether they may hint educators to neuromyths in applied contexts when users ask questions comprising related misconceptions. Additionally, we examined whether explicitly prompting LLMs to base their answer on scientific evidence or to correct unsupported assumptions would decrease errors in identifying neuromyths.</div><div><strong>Results</strong>: LLMs outperformed humans in identifying neuromyth statements as used in previous studies. However, when presented with applied user-like questions comprising misconceptions, they struggled to highlight or dispute these. Interestingly, explicitly asking LLMs to correct unsupported assumptions increased the likelihood that misconceptions were flagged considerably, while prompting the models to rely on scientific evidence had only little effects.</div><div><strong>Conclusion</strong>: While LLMs outperformed humans at identifying isolated neuromyth statements, they struggled to hint users towards the same misconception when they were included in more applied user-like questions—presumably due to LLMs’ tendency toward sycophantic responses. This limitation suggests that, despite their potential, LLMs are not yet a reliable safeguard against the spread of neuromyths in educational settings. However, when users explicitly prompt LLMs to correct unsupported assumptions—an approach that may initially seem counterintuitive–this effectively reduced sycophantic responses.</div></div>","PeriodicalId":46228,"journal":{"name":"Trends in Neuroscience and Education","volume":"39 ","pages":"Article 100255"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trends in Neuroscience and Education","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211949325000092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Neuromyths are widespread among educators, which raises concerns about misconceptions regarding the (neural) principles underlying learning in the educator population. With the increasing use of large language models (LLMs) in education, educators are increasingly relying on these for lesson planning and professional development. Therefore, if LLMs correctly identify neuromyths, they may help to dispute related misconceptions.
Method: We evaluated whether LLMs can correctly identify neuromyths and whether they may hint educators to neuromyths in applied contexts when users ask questions comprising related misconceptions. Additionally, we examined whether explicitly prompting LLMs to base their answer on scientific evidence or to correct unsupported assumptions would decrease errors in identifying neuromyths.
Results: LLMs outperformed humans in identifying neuromyth statements as used in previous studies. However, when presented with applied user-like questions comprising misconceptions, they struggled to highlight or dispute these. Interestingly, explicitly asking LLMs to correct unsupported assumptions increased the likelihood that misconceptions were flagged considerably, while prompting the models to rely on scientific evidence had only little effects.
Conclusion: While LLMs outperformed humans at identifying isolated neuromyth statements, they struggled to hint users towards the same misconception when they were included in more applied user-like questions—presumably due to LLMs’ tendency toward sycophantic responses. This limitation suggests that, despite their potential, LLMs are not yet a reliable safeguard against the spread of neuromyths in educational settings. However, when users explicitly prompt LLMs to correct unsupported assumptions—an approach that may initially seem counterintuitive–this effectively reduced sycophantic responses.