Medical language matters: impact of clinical summary composition on a generative artificial intelligence's diagnostic accuracy.

IF 2.2 Q2 MEDICINE, GENERAL & INTERNAL
Diagnosis Pub Date : 2024-12-12 DOI:10.1515/dx-2024-0167
Cassandra Skittle, Eliana Bonifacino, Casey N McQuade
{"title":"Medical language matters: impact of clinical summary composition on a generative artificial intelligence's diagnostic accuracy.","authors":"Cassandra Skittle, Eliana Bonifacino, Casey N McQuade","doi":"10.1515/dx-2024-0167","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Evaluate the impact of problem representation (PR) characteristics on Generative Artificial Intelligence (GAI) diagnostic accuracy.</p><p><strong>Methods: </strong>Internal medicine attendings and residents from two academic medical centers were given a clinical vignette and instructed to write a PR. Deductive content analysis described the characteristics comprising each PR. Individual PRs were input into ChatGPT-4 (OpenAI, September 2023) which was prompted to generate a ranked three-item differential. The ranked differential and the top-ranked diagnosis were scored on a 3-part scale, ranging from incorrect, partially correct, to correct. Logistic regression evaluated individual PR characteristic's impact on ChatGPT accuracy.</p><p><strong>Results: </strong>For a three-item differential, accuracy was associated with including fewer comorbidities (OR 0.57, p=0.010), fewer past historical items (OR 0.60, p=0.019), and more physical examination items (OR 1.66, p=0.015). For ChatGPT's ability to rank the true diagnosis as the single-best diagnosis, utilizing temporal semantic qualifiers, more semantic qualifiers overall, and adhering to a typical 3-part PR format all correlated with diagnostic accuracy: OR 3.447, p=0.046; OR 1.300, p=0.005; OR 3.577, p=0.020, respectively.</p><p><strong>Conclusions: </strong>Several distinct PR factors improved ChatGPT diagnostic accuracy. These factors have previously been associated with expertise in creating PR. Future studies should explore how clinical input qualities affect GAI diagnostic accuracy prospectively.</p>","PeriodicalId":11273,"journal":{"name":"Diagnosis","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnosis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/dx-2024-0167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: Evaluate the impact of problem representation (PR) characteristics on Generative Artificial Intelligence (GAI) diagnostic accuracy.

Methods: Internal medicine attendings and residents from two academic medical centers were given a clinical vignette and instructed to write a PR. Deductive content analysis described the characteristics comprising each PR. Individual PRs were input into ChatGPT-4 (OpenAI, September 2023) which was prompted to generate a ranked three-item differential. The ranked differential and the top-ranked diagnosis were scored on a 3-part scale, ranging from incorrect, partially correct, to correct. Logistic regression evaluated individual PR characteristic's impact on ChatGPT accuracy.

Results: For a three-item differential, accuracy was associated with including fewer comorbidities (OR 0.57, p=0.010), fewer past historical items (OR 0.60, p=0.019), and more physical examination items (OR 1.66, p=0.015). For ChatGPT's ability to rank the true diagnosis as the single-best diagnosis, utilizing temporal semantic qualifiers, more semantic qualifiers overall, and adhering to a typical 3-part PR format all correlated with diagnostic accuracy: OR 3.447, p=0.046; OR 1.300, p=0.005; OR 3.577, p=0.020, respectively.

Conclusions: Several distinct PR factors improved ChatGPT diagnostic accuracy. These factors have previously been associated with expertise in creating PR. Future studies should explore how clinical input qualities affect GAI diagnostic accuracy prospectively.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Diagnosis
Diagnosis MEDICINE, GENERAL & INTERNAL-
CiteScore
7.20
自引率
5.70%
发文量
41
期刊介绍: Diagnosis focuses on how diagnosis can be advanced, how it is taught, and how and why it can fail, leading to diagnostic errors. The journal welcomes both fundamental and applied works, improvement initiatives, opinions, and debates to encourage new thinking on improving this critical aspect of healthcare quality.  Topics: -Factors that promote diagnostic quality and safety -Clinical reasoning -Diagnostic errors in medicine -The factors that contribute to diagnostic error: human factors, cognitive issues, and system-related breakdowns -Improving the value of diagnosis – eliminating waste and unnecessary testing -How culture and removing blame promote awareness of diagnostic errors -Training and education related to clinical reasoning and diagnostic skills -Advances in laboratory testing and imaging that improve diagnostic capability -Local, national and international initiatives to reduce diagnostic error
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信