Towards Community-Based Evaluation of AI in Neurology: Development of a Headache Diagnosis Dataset for Large Language Models.

Anika Zahn, Sebastian Strauss, Dorian Zwanzig
{"title":"Towards Community-Based Evaluation of AI in Neurology: Development of a Headache Diagnosis Dataset for Large Language Models.","authors":"Anika Zahn, Sebastian Strauss, Dorian Zwanzig","doi":"10.3233/SHTI251535","DOIUrl":null,"url":null,"abstract":"<p><p>Diagnosing headache disorders remains a clinical challenge due to the heterogeneity of headache phenotypes and the absence of objective biomarkers. This study presents a curated dataset of 50 clinical headache case examples, comprising both real (n = 34) and synthetic (n = 16) cases, categorized across 20 diagnoses according to ICHD-3 criteria. The dataset enables the evaluation of large language models (LLMs) for diagnostic accuracy in headache medicine. Three GPT-based models were tested using different prompting strategies, with diagnostic performance assessed at both diagnosis and group levels. Top-1 accuracy ranged from 24% to 63% at the diagnosis level and up to 92% at the group level. The results highlight the potential of LLMs in supporting differential diagnosis of headache disorders, while also emphasizing the need for further validation with larger, diverse datasets. Future efforts will focus on expanding real-world data through clinical collaborations and benchmarking LLMs against medical professionals to assess their utility in clinical decision-making.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"237-241"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251535","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Diagnosing headache disorders remains a clinical challenge due to the heterogeneity of headache phenotypes and the absence of objective biomarkers. This study presents a curated dataset of 50 clinical headache case examples, comprising both real (n = 34) and synthetic (n = 16) cases, categorized across 20 diagnoses according to ICHD-3 criteria. The dataset enables the evaluation of large language models (LLMs) for diagnostic accuracy in headache medicine. Three GPT-based models were tested using different prompting strategies, with diagnostic performance assessed at both diagnosis and group levels. Top-1 accuracy ranged from 24% to 63% at the diagnosis level and up to 92% at the group level. The results highlight the potential of LLMs in supporting differential diagnosis of headache disorders, while also emphasizing the need for further validation with larger, diverse datasets. Future efforts will focus on expanding real-world data through clinical collaborations and benchmarking LLMs against medical professionals to assess their utility in clinical decision-making.

神经学中基于社区的人工智能评估:大型语言模型的头痛诊断数据集的开发。
由于头痛表型的异质性和缺乏客观的生物标志物,诊断头痛疾病仍然是一个临床挑战。本研究提出了一个精心整理的50例临床头痛病例数据集,包括真实病例(n = 34)和合成病例(n = 16),根据ICHD-3标准分为20种诊断。该数据集能够评估大型语言模型(llm)在头痛医学中的诊断准确性。使用不同的提示策略测试了三种基于gpt的模型,并在诊断和组水平上评估了诊断性能。在诊断水平上,Top-1的准确率从24%到63%不等,在组水平上高达92%。结果强调了llm在支持头痛疾病鉴别诊断方面的潜力,同时也强调了需要用更大、更多样化的数据集进一步验证。未来的努力将集中在通过临床合作扩展真实世界的数据,并将法学硕士与医疗专业人员进行基准测试,以评估其在临床决策中的效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信