Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance.

JMIRx med Pub Date : 2025-03-19 DOI:10.2196/65263
Masab Mansoor, Andrew F Ibrahim, David Grindem, Asad Baig
{"title":"Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance.","authors":"Masab Mansoor, Andrew F Ibrahim, David Grindem, Asad Baig","doi":"10.2196/65263","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision support but remain understudied in pediatric differential diagnosis.</p><p><strong>Objective: </strong>This study aims to evaluate the diagnostic accuracy and reliability of a fine-tuned GPT-3 model compared to board-certified pediatricians in rural health care settings.</p><p><strong>Methods: </strong>This multicenter retrospective cohort study analyzed 500 pediatric encounters (ages 0-18 years; n=261, 52.2% female) from rural health care organizations in Central Louisiana between January 2020 and December 2021. The GPT-3 model (DaVinci version) was fine-tuned using the OpenAI application programming interface and trained on 350 encounters, with 150 reserved for testing. Five board-certified pediatricians (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance was assessed using accuracy, sensitivity, specificity, and subgroup analyses.</p><p><strong>Results: </strong>The GPT-3 model achieved an accuracy of 87.3% (131/150 cases), sensitivity of 85% (95% CI 82%-88%), and specificity of 90% (95% CI 87%-93%), comparable to pediatricians' accuracy of 91.3% (137/150 cases; P=.47). Performance was consistent across age groups (0-5 years: 54/62, 87%; 6-12 years: 47/53, 89%; 13-18 years: 30/35, 86%) and common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), accuracy was slightly lower (16/20, 80%) but comparable to pediatricians (17/20, 85%; P=.62).</p><p><strong>Conclusions: </strong>This study demonstrates that a fine-tuned GPT-3 model can provide diagnostic support comparable to pediatricians, particularly for common presentations, in rural health care. Further validation in diverse populations is necessary before clinical implementation.</p>","PeriodicalId":73558,"journal":{"name":"JMIRx med","volume":"6 ","pages":"e65263"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11939124/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIRx med","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/65263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Rural health care providers face unique challenges such as limited specialist access and high patient volumes, making accurate diagnostic support tools essential. Large language models like GPT-3 have demonstrated potential in clinical decision support but remain understudied in pediatric differential diagnosis.

Objective: This study aims to evaluate the diagnostic accuracy and reliability of a fine-tuned GPT-3 model compared to board-certified pediatricians in rural health care settings.

Methods: This multicenter retrospective cohort study analyzed 500 pediatric encounters (ages 0-18 years; n=261, 52.2% female) from rural health care organizations in Central Louisiana between January 2020 and December 2021. The GPT-3 model (DaVinci version) was fine-tuned using the OpenAI application programming interface and trained on 350 encounters, with 150 reserved for testing. Five board-certified pediatricians (mean experience: 12, SD 5.8 years) provided reference standard diagnoses. Model performance was assessed using accuracy, sensitivity, specificity, and subgroup analyses.

Results: The GPT-3 model achieved an accuracy of 87.3% (131/150 cases), sensitivity of 85% (95% CI 82%-88%), and specificity of 90% (95% CI 87%-93%), comparable to pediatricians' accuracy of 91.3% (137/150 cases; P=.47). Performance was consistent across age groups (0-5 years: 54/62, 87%; 6-12 years: 47/53, 89%; 13-18 years: 30/35, 86%) and common complaints (fever: 36/39, 92%; abdominal pain: 20/23, 87%). For rare diagnoses (n=20), accuracy was slightly lower (16/20, 80%) but comparable to pediatricians (17/20, 85%; P=.62).

Conclusions: This study demonstrates that a fine-tuned GPT-3 model can provide diagnostic support comparable to pediatricians, particularly for common presentations, in rural health care. Further validation in diverse populations is necessary before clinical implementation.

农村卫生保健中儿童鉴别诊断的大语言模型:比较GPT-3与儿科医生表现的多中心回顾性队列研究
背景:农村卫生保健提供者面临着独特的挑战,如专家准入有限和患者数量多,这使得准确的诊断支持工具至关重要。像GPT-3这样的大型语言模型已经证明了在临床决策支持方面的潜力,但在儿科鉴别诊断方面仍未得到充分研究。目的:本研究旨在评估微调GPT-3模型的诊断准确性和可靠性,并将其与农村卫生保健机构的委员会认证儿科医生进行比较。方法:本多中心回顾性队列研究分析了500例儿科就诊(0-18岁;2020年1月至2021年12月期间,路易斯安那州中部农村卫生保健组织的n=261(52.2%为女性)。GPT-3模型(达芬奇版本)使用OpenAI应用程序编程接口进行微调,并进行了350次训练,其中150次用于测试。5名委员会认证的儿科医生(平均经验:12,标准差5.8年)提供了参考标准诊断。采用准确性、敏感性、特异性和亚组分析评估模型性能。结果:GPT-3模型的准确率为87.3%(131/150例),灵敏度为85% (95% CI 82% ~ 88%),特异性为90% (95% CI 87% ~ 93%),与儿科医生的准确率91.3%(137/150例;P =票价)。各年龄组的表现一致(0-5岁:54/62,87%;6-12岁:47/53,89%;13-18岁:30/35,86%)和常见主诉(发热:36/39,92%;腹痛:20/23(87%)。对于罕见诊断(n=20),准确率略低(16/ 20,80%),但与儿科医生相当(17/ 20,85%;P = .62)。结论:本研究表明,在农村卫生保健中,经过微调的GPT-3模型可以提供与儿科医生相当的诊断支持,特别是对于常见的表现。在临床应用之前,需要在不同人群中进一步验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信