评价大语言模型在灼口综合征诊断中的应用。

IF 2.5 3区 医学 Q2 CLINICAL NEUROLOGY
Journal of Pain Research Pub Date : 2025-03-19 eCollection Date: 2025-01-01 DOI:10.2147/JPR.S509845
Takayuki Suga, Osamu Uehara, Yoshihiro Abiko, Akira Toyofuku
{"title":"评价大语言模型在灼口综合征诊断中的应用。","authors":"Takayuki Suga, Osamu Uehara, Yoshihiro Abiko, Akira Toyofuku","doi":"10.2147/JPR.S509845","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Large language models have been proposed as diagnostic aids across various medical fields, including dentistry. Burning mouth syndrome, characterized by burning sensations in the oral cavity without identifiable cause, poses diagnostic challenges. This study explores the diagnostic accuracy of large language models in identifying burning mouth syndrome, hypothesizing potential limitations.</p><p><strong>Materials and methods: </strong>Clinical vignettes of 100 synthesized burning mouth syndrome cases were evaluated using three large language models (ChatGPT-4o, Gemini Advanced 1.5 Pro, and Claude 3.5 Sonnet). Each vignette included patient demographics, symptoms, and medical history. Large language models were prompted to provide a primary diagnosis, differential diagnoses, and their reasoning. Accuracy was determined by comparing their responses with expert evaluations.</p><p><strong>Results: </strong>ChatGPT and Claude achieved an accuracy rate of 99%, while Gemini's accuracy was 89% (p < 0.001). Misdiagnoses included Persistent Idiopathic Facial Pain and combined diagnoses with inappropriate conditions. Differences were also observed in reasoning patterns and additional data requests across the large language models.</p><p><strong>Discussion: </strong>Despite high overall accuracy, the models exhibited variations in reasoning approaches and occasional errors, underscoring the importance of clinician oversight. Limitations include the synthesized nature of vignettes, potential over-reliance on exclusionary criteria, and challenges in differentiating overlapping disorders.</p><p><strong>Conclusion: </strong>Large language models demonstrate strong potential as supplementary diagnostic tools for burning mouth syndrome, especially in settings lacking specialist expertise. However, their reliability depends on thorough patient assessment and expert verification. Integrating large language models into routine diagnostics could enhance early detection and management, ultimately improving clinical decision-making for dentists and specialists alike.</p>","PeriodicalId":16661,"journal":{"name":"Journal of Pain Research","volume":"18 ","pages":"1387-1405"},"PeriodicalIF":2.5000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11930279/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis.\",\"authors\":\"Takayuki Suga, Osamu Uehara, Yoshihiro Abiko, Akira Toyofuku\",\"doi\":\"10.2147/JPR.S509845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Large language models have been proposed as diagnostic aids across various medical fields, including dentistry. Burning mouth syndrome, characterized by burning sensations in the oral cavity without identifiable cause, poses diagnostic challenges. This study explores the diagnostic accuracy of large language models in identifying burning mouth syndrome, hypothesizing potential limitations.</p><p><strong>Materials and methods: </strong>Clinical vignettes of 100 synthesized burning mouth syndrome cases were evaluated using three large language models (ChatGPT-4o, Gemini Advanced 1.5 Pro, and Claude 3.5 Sonnet). Each vignette included patient demographics, symptoms, and medical history. Large language models were prompted to provide a primary diagnosis, differential diagnoses, and their reasoning. Accuracy was determined by comparing their responses with expert evaluations.</p><p><strong>Results: </strong>ChatGPT and Claude achieved an accuracy rate of 99%, while Gemini's accuracy was 89% (p < 0.001). Misdiagnoses included Persistent Idiopathic Facial Pain and combined diagnoses with inappropriate conditions. Differences were also observed in reasoning patterns and additional data requests across the large language models.</p><p><strong>Discussion: </strong>Despite high overall accuracy, the models exhibited variations in reasoning approaches and occasional errors, underscoring the importance of clinician oversight. Limitations include the synthesized nature of vignettes, potential over-reliance on exclusionary criteria, and challenges in differentiating overlapping disorders.</p><p><strong>Conclusion: </strong>Large language models demonstrate strong potential as supplementary diagnostic tools for burning mouth syndrome, especially in settings lacking specialist expertise. However, their reliability depends on thorough patient assessment and expert verification. Integrating large language models into routine diagnostics could enhance early detection and management, ultimately improving clinical decision-making for dentists and specialists alike.</p>\",\"PeriodicalId\":16661,\"journal\":{\"name\":\"Journal of Pain Research\",\"volume\":\"18 \",\"pages\":\"1387-1405\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11930279/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Pain Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2147/JPR.S509845\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pain Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/JPR.S509845","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型已被提议作为各种医学领域的诊断辅助工具,包括牙科。灼口综合征的特点是在口腔烧灼感没有明确的原因,提出了诊断的挑战。本研究探讨了大型语言模型在识别灼口综合征中的诊断准确性,并假设了潜在的局限性。材料与方法:采用chatgpt - 40、Gemini Advanced 1.5 Pro、Claude 3.5 Sonnet三种大型语言模型对100例合成灼口综合征的临床影像进行评价。每个小插图包括患者人口统计、症状和病史。大型语言模型被提示提供初步诊断、鉴别诊断及其推理。准确性是通过比较他们的回答和专家的评价来确定的。结果:ChatGPT和Claude的准确率为99%,Gemini的准确率为89% (p < 0.001)。误诊包括持续性特发性面部疼痛和合并不适当条件的诊断。在推理模式和跨大型语言模型的附加数据请求方面也观察到差异。讨论:尽管总体准确性很高,但模型在推理方法和偶尔的错误上表现出差异,强调了临床医生监督的重要性。局限性包括小插曲的综合性质,对排除标准的潜在过度依赖,以及在区分重叠疾病方面的挑战。结论:大型语言模型显示了作为灼口综合征辅助诊断工具的强大潜力,特别是在缺乏专业知识的环境中。然而,它们的可靠性取决于彻底的患者评估和专家验证。将大型语言模型集成到常规诊断中可以增强早期检测和管理,最终改善牙医和专家的临床决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis.

Introduction: Large language models have been proposed as diagnostic aids across various medical fields, including dentistry. Burning mouth syndrome, characterized by burning sensations in the oral cavity without identifiable cause, poses diagnostic challenges. This study explores the diagnostic accuracy of large language models in identifying burning mouth syndrome, hypothesizing potential limitations.

Materials and methods: Clinical vignettes of 100 synthesized burning mouth syndrome cases were evaluated using three large language models (ChatGPT-4o, Gemini Advanced 1.5 Pro, and Claude 3.5 Sonnet). Each vignette included patient demographics, symptoms, and medical history. Large language models were prompted to provide a primary diagnosis, differential diagnoses, and their reasoning. Accuracy was determined by comparing their responses with expert evaluations.

Results: ChatGPT and Claude achieved an accuracy rate of 99%, while Gemini's accuracy was 89% (p < 0.001). Misdiagnoses included Persistent Idiopathic Facial Pain and combined diagnoses with inappropriate conditions. Differences were also observed in reasoning patterns and additional data requests across the large language models.

Discussion: Despite high overall accuracy, the models exhibited variations in reasoning approaches and occasional errors, underscoring the importance of clinician oversight. Limitations include the synthesized nature of vignettes, potential over-reliance on exclusionary criteria, and challenges in differentiating overlapping disorders.

Conclusion: Large language models demonstrate strong potential as supplementary diagnostic tools for burning mouth syndrome, especially in settings lacking specialist expertise. However, their reliability depends on thorough patient assessment and expert verification. Integrating large language models into routine diagnostics could enhance early detection and management, ultimately improving clinical decision-making for dentists and specialists alike.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Pain Research
Journal of Pain Research CLINICAL NEUROLOGY-
CiteScore
4.50
自引率
3.70%
发文量
411
审稿时长
16 weeks
期刊介绍: Journal of Pain Research is an international, peer-reviewed, open access journal that welcomes laboratory and clinical findings in the fields of pain research and the prevention and management of pain. Original research, reviews, symposium reports, hypothesis formation and commentaries are all considered for publication. Additionally, the journal now welcomes the submission of pain-policy-related editorials and commentaries, particularly in regard to ethical, regulatory, forensic, and other legal issues in pain medicine, and to the education of pain practitioners and researchers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信