探索人工智能在乳头水肿诊断中的潜力,以支持农村医疗保健的皮肤病治疗决策。

IF 3.3 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Jonathan Shapiro, Mor Atlas, Naomi Fridman, Itay Cohen, Ziad Khamaysi, Mahdi Awwad, Naomi Silverstein, Tom Kozlovsky, Idit Maharshak
{"title":"探索人工智能在乳头水肿诊断中的潜力,以支持农村医疗保健的皮肤病治疗决策。","authors":"Jonathan Shapiro, Mor Atlas, Naomi Fridman, Itay Cohen, Ziad Khamaysi, Mahdi Awwad, Naomi Silverstein, Tom Kozlovsky, Idit Maharshak","doi":"10.3390/diagnostics15192547","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background</b>: Papilledema, an ophthalmic finding associated with increased intracranial pressure, is often induced by dermatological medications, including corticosteroids, isotretinoin, and tetracyclines. Early detection is crucial for preventing irreversible optic nerve damage, but access to ophthalmologic expertise is often limited in rural settings. Artificial intelligence (AI) may enable the automated and accurate detection of papilledema from fundus images, thereby supporting timely diagnosis and management. <b>Objective</b>: The primary objective of this study was to explore the diagnostic capability of ChatGPT-4o, a general large language model with multimodal input, in identifying papilledema from fundus photographs. For context, its performance was compared with a ResNet-based convolutional neural network (CNN) specifically fine-tuned for ophthalmic imaging, as well as with the assessments of two human ophthalmologists. The focus was on applications relevant to dermatological care in resource-limited environments. <b>Methods</b>: A dataset of 1094 fundus images (295 papilledema, 799 normal) was preprocessed and partitioned into a training set and a test set. The ResNet model was fine-tuned using discriminative learning rates and a one-cycle learning rate policy. GPT-4o and two human evaluators (a senior ophthalmologist and an ophthalmology resident) independently assessed the test images. Diagnostic metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and Cohen's Kappa, were calculated for each evaluator. <b>Results</b>: GPT-4o, when applied to papilledema detection, achieved an overall accuracy of 85.9% with substantial agreement beyond chance (Cohen's Kappa = 0.72), but lower specificity (78.9%) and positive predictive value (73.7%) compared to benchmark models. For context, the ResNet model, fine-tuned for ophthalmic imaging, reached near-perfect accuracy (99.5%, Kappa = 0.99), while two human ophthalmologists achieved accuracies of 96.0% (Kappa ≈ 0.92). <b>Conclusions</b>: This study explored the capability of GPT-4o, a large language model with multimodal input, for detecting papilledema from fundus photographs. GPT-4o achieved moderate diagnostic accuracy and substantial agreement with the ground truth, but it underperformed compared to both a domain-specific ResNet model and human ophthalmologists. These findings underscore the distinction between generalist large language models and specialized diagnostic AI: while GPT-4o is not optimized for ophthalmic imaging, its accessibility, adaptability, and rapid evolution highlight its potential as a future adjunct in clinical screening, particularly in underserved settings. These findings also underscore the need for validation on external datasets and real-world clinical environments before such tools can be broadly implemented.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 19","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12523928/pdf/","citationCount":"0","resultStr":"{\"title\":\"Exploring AI's Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare.\",\"authors\":\"Jonathan Shapiro, Mor Atlas, Naomi Fridman, Itay Cohen, Ziad Khamaysi, Mahdi Awwad, Naomi Silverstein, Tom Kozlovsky, Idit Maharshak\",\"doi\":\"10.3390/diagnostics15192547\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background</b>: Papilledema, an ophthalmic finding associated with increased intracranial pressure, is often induced by dermatological medications, including corticosteroids, isotretinoin, and tetracyclines. Early detection is crucial for preventing irreversible optic nerve damage, but access to ophthalmologic expertise is often limited in rural settings. Artificial intelligence (AI) may enable the automated and accurate detection of papilledema from fundus images, thereby supporting timely diagnosis and management. <b>Objective</b>: The primary objective of this study was to explore the diagnostic capability of ChatGPT-4o, a general large language model with multimodal input, in identifying papilledema from fundus photographs. For context, its performance was compared with a ResNet-based convolutional neural network (CNN) specifically fine-tuned for ophthalmic imaging, as well as with the assessments of two human ophthalmologists. The focus was on applications relevant to dermatological care in resource-limited environments. <b>Methods</b>: A dataset of 1094 fundus images (295 papilledema, 799 normal) was preprocessed and partitioned into a training set and a test set. The ResNet model was fine-tuned using discriminative learning rates and a one-cycle learning rate policy. GPT-4o and two human evaluators (a senior ophthalmologist and an ophthalmology resident) independently assessed the test images. Diagnostic metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and Cohen's Kappa, were calculated for each evaluator. <b>Results</b>: GPT-4o, when applied to papilledema detection, achieved an overall accuracy of 85.9% with substantial agreement beyond chance (Cohen's Kappa = 0.72), but lower specificity (78.9%) and positive predictive value (73.7%) compared to benchmark models. For context, the ResNet model, fine-tuned for ophthalmic imaging, reached near-perfect accuracy (99.5%, Kappa = 0.99), while two human ophthalmologists achieved accuracies of 96.0% (Kappa ≈ 0.92). <b>Conclusions</b>: This study explored the capability of GPT-4o, a large language model with multimodal input, for detecting papilledema from fundus photographs. GPT-4o achieved moderate diagnostic accuracy and substantial agreement with the ground truth, but it underperformed compared to both a domain-specific ResNet model and human ophthalmologists. These findings underscore the distinction between generalist large language models and specialized diagnostic AI: while GPT-4o is not optimized for ophthalmic imaging, its accessibility, adaptability, and rapid evolution highlight its potential as a future adjunct in clinical screening, particularly in underserved settings. These findings also underscore the need for validation on external datasets and real-world clinical environments before such tools can be broadly implemented.</p>\",\"PeriodicalId\":11225,\"journal\":{\"name\":\"Diagnostics\",\"volume\":\"15 19\",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12523928/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/diagnostics15192547\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15192547","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

背景:视神经乳头水肿是一种与颅内压升高相关的眼科发现,通常由皮肤药物引起,包括皮质类固醇、异维甲酸和四环素。早期发现对于防止不可逆的视神经损伤至关重要,但在农村地区,获得眼科专业知识的机会往往有限。人工智能(AI)可以从眼底图像中自动准确地检测乳头水肿,从而支持及时诊断和管理。目的:本研究的主要目的是探讨chatgpt - 40的诊断能力,chatgpt - 40是一种具有多模态输入的通用大型语言模型,用于从眼底照片中识别乳头水肿。为此,将其性能与专门针对眼科成像进行微调的基于resnet的卷积神经网络(CNN)以及两位人类眼科医生的评估进行了比较。重点是在资源有限的环境中与皮肤病护理有关的应用。方法:对1094张眼底图像(瞳孔水肿295张,正常799张)进行预处理,划分为训练集和测试集。使用判别学习率和单周期学习率策略对ResNet模型进行微调。gpt - 40和两名人类评估者(一名高级眼科医生和一名眼科住院医师)独立评估了测试图像。计算每个评估者的诊断指标,包括敏感性、特异性、阳性预测值(PPV)、阴性预测值(NPV)、准确性和科恩Kappa。结果:gpt - 40应用于乳头水肿检测时,总体准确率为85.9%,一致性显著(Cohen’s Kappa = 0.72),但与基准模型相比,特异性(78.9%)和阳性预测值(73.7%)较低。在此背景下,ResNet模型对眼科成像进行了微调,达到了近乎完美的准确率(99.5%,Kappa = 0.99),而两位人类眼科医生的准确率为96.0% (Kappa≈0.92)。结论:本研究探索了gpt - 40(一个具有多模态输入的大型语言模型)从眼底照片中检测乳头水肿的能力。gpt - 40达到了中等的诊断准确性,并且与基本事实基本一致,但与特定领域的ResNet模型和人类眼科医生相比,它的表现都不佳。这些发现强调了通才大语言模型和专门诊断人工智能之间的区别:虽然gpt - 40没有针对眼科成像进行优化,但其可及性、适应性和快速发展突出了其作为未来临床筛查辅助手段的潜力,特别是在服务不足的环境中。这些发现也强调了在这些工具被广泛应用之前,需要对外部数据集和真实临床环境进行验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Exploring AI's Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare.

Exploring AI's Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare.

Exploring AI's Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare.

Exploring AI's Potential in Papilledema Diagnosis to Support Dermatological Treatment Decisions in Rural Healthcare.

Background: Papilledema, an ophthalmic finding associated with increased intracranial pressure, is often induced by dermatological medications, including corticosteroids, isotretinoin, and tetracyclines. Early detection is crucial for preventing irreversible optic nerve damage, but access to ophthalmologic expertise is often limited in rural settings. Artificial intelligence (AI) may enable the automated and accurate detection of papilledema from fundus images, thereby supporting timely diagnosis and management. Objective: The primary objective of this study was to explore the diagnostic capability of ChatGPT-4o, a general large language model with multimodal input, in identifying papilledema from fundus photographs. For context, its performance was compared with a ResNet-based convolutional neural network (CNN) specifically fine-tuned for ophthalmic imaging, as well as with the assessments of two human ophthalmologists. The focus was on applications relevant to dermatological care in resource-limited environments. Methods: A dataset of 1094 fundus images (295 papilledema, 799 normal) was preprocessed and partitioned into a training set and a test set. The ResNet model was fine-tuned using discriminative learning rates and a one-cycle learning rate policy. GPT-4o and two human evaluators (a senior ophthalmologist and an ophthalmology resident) independently assessed the test images. Diagnostic metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and Cohen's Kappa, were calculated for each evaluator. Results: GPT-4o, when applied to papilledema detection, achieved an overall accuracy of 85.9% with substantial agreement beyond chance (Cohen's Kappa = 0.72), but lower specificity (78.9%) and positive predictive value (73.7%) compared to benchmark models. For context, the ResNet model, fine-tuned for ophthalmic imaging, reached near-perfect accuracy (99.5%, Kappa = 0.99), while two human ophthalmologists achieved accuracies of 96.0% (Kappa ≈ 0.92). Conclusions: This study explored the capability of GPT-4o, a large language model with multimodal input, for detecting papilledema from fundus photographs. GPT-4o achieved moderate diagnostic accuracy and substantial agreement with the ground truth, but it underperformed compared to both a domain-specific ResNet model and human ophthalmologists. These findings underscore the distinction between generalist large language models and specialized diagnostic AI: while GPT-4o is not optimized for ophthalmic imaging, its accessibility, adaptability, and rapid evolution highlight its potential as a future adjunct in clinical screening, particularly in underserved settings. These findings also underscore the need for validation on external datasets and real-world clinical environments before such tools can be broadly implemented.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Diagnostics
Diagnostics Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍: Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信