Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist

IF 4.7 Q2 IMMUNOLOGY
Joshua Pillai , Kathryn Pillai
{"title":"Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist","authors":"Joshua Pillai ,&nbsp;Kathryn Pillai","doi":"10.1016/j.jtauto.2023.100213","DOIUrl":null,"url":null,"abstract":"<div><p>With the increasing development of artificial intelligence, large language models (LLMs) have been utilized to solve problems in natural language processing tasks. More recently, LLMs have shown unique potential in numerous applications within medicine but have been particularly investigated for their ability in clinical reasoning. Although the diagnostic accuracy of LLMs in forming differential diagnoses has been reviewed in general internal medicine applications, much is unknown in autoinflammatory disorders. From the nature of autoinflammatory diseases, forming a differential diagnosis is challenging due to the overlapping symptoms between disorders and even more difficult without genetic screening. In this work, the diagnostic accuracy of the Generative Pre-Trained Transformer Model-4 (GPT-4), GPT-3.5, and Large Language Model Meta AI (LLaMa) were evaluated in clinical vignettes of Deficiency of Interleukin-1 Receptor Antagonist (DIRA) and Familial Mediterranean Fever (FMF). We then compared these models to a control group including one internal medicine physician. It was found that GPT-4 did not significantly differ in correctly identifying DIRA and FMF patients compared to the internist. However, the physician maintained a significantly higher accuracy than GPT-3.5 and LLaMa 2 for either disease. Overall, we explore and discuss the unique potential of LLMs in diagnostics for autoimmune diseases.</p></div>","PeriodicalId":36425,"journal":{"name":"Journal of Translational Autoimmunity","volume":"7 ","pages":"Article 100213"},"PeriodicalIF":4.7000,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Translational Autoimmunity","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589909023000266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

With the increasing development of artificial intelligence, large language models (LLMs) have been utilized to solve problems in natural language processing tasks. More recently, LLMs have shown unique potential in numerous applications within medicine but have been particularly investigated for their ability in clinical reasoning. Although the diagnostic accuracy of LLMs in forming differential diagnoses has been reviewed in general internal medicine applications, much is unknown in autoinflammatory disorders. From the nature of autoinflammatory diseases, forming a differential diagnosis is challenging due to the overlapping symptoms between disorders and even more difficult without genetic screening. In this work, the diagnostic accuracy of the Generative Pre-Trained Transformer Model-4 (GPT-4), GPT-3.5, and Large Language Model Meta AI (LLaMa) were evaluated in clinical vignettes of Deficiency of Interleukin-1 Receptor Antagonist (DIRA) and Familial Mediterranean Fever (FMF). We then compared these models to a control group including one internal medicine physician. It was found that GPT-4 did not significantly differ in correctly identifying DIRA and FMF patients compared to the internist. However, the physician maintained a significantly higher accuracy than GPT-3.5 and LLaMa 2 for either disease. Overall, we explore and discuss the unique potential of LLMs in diagnostics for autoimmune diseases.

生殖人工智能模型在家族性地中海热和白细胞介素-1受体拮抗剂缺乏鉴别诊断中的准确性
随着人工智能的不断发展,大型语言模型(large language models, llm)已被用于解决自然语言处理任务中的问题。最近,法学硕士在医学领域的众多应用中显示出独特的潜力,但他们在临床推理方面的能力也受到了特别的研究。虽然LLMs在形成鉴别诊断中的诊断准确性已经在一般内科应用中得到了回顾,但在自身炎症性疾病中仍有很多未知。从自身炎症性疾病的本质来看,由于疾病之间的症状重叠,形成鉴别诊断是具有挑战性的,如果没有遗传筛查就更加困难。在这项工作中,我们评估了生成预训练变压器模型4 (GPT-4)、GPT-3.5和大型语言模型Meta AI (LLaMa)在白细胞介素-1受体拮抗剂(DIRA)缺乏症和家族性地中海热(FMF)的诊断准确性。然后,我们将这些模型与包括一名内科医生在内的对照组进行比较。与内科医生相比,GPT-4在正确识别DIRA和FMF患者方面没有显著差异。然而,对于任何一种疾病,医生都保持了比GPT-3.5和LLaMa 2更高的准确性。总之,我们探索和讨论llm在自身免疫性疾病诊断中的独特潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Translational Autoimmunity
Journal of Translational Autoimmunity Medicine-Immunology and Allergy
CiteScore
7.80
自引率
2.60%
发文量
33
审稿时长
55 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信