CONCORDANCE BETWEEN EXPERT GASTROENTEROLOGISTS AND ARTIFICIAL INTELLIGENCE TOOLS IN SOLVING HEPATOLOGY CLINICAL CASES

IF 4.4 3区 医学 Q2 GASTROENTEROLOGY & HEPATOLOGY
Jesús Ignacio Mazadiego Cid , María del Rosario Herrero Maceda , Paloma Montserrat Diego Salazar , Rogelio Zapata Arenas , Scherezada María Isabel Mejía Loza , Juanita Pérez Escobar , María Fátima Higuera de la Tijera , Elías Artemio San Vicente Parada , Raquel Yazmín López Pérez , Felipe Zamarripa Dorsey , Yoali Maribel Velasco Santiago , Adriana López Luria , Moises Coutiño Flores , Alejandra Díaz García
{"title":"CONCORDANCE BETWEEN EXPERT GASTROENTEROLOGISTS AND ARTIFICIAL INTELLIGENCE TOOLS IN SOLVING HEPATOLOGY CLINICAL CASES","authors":"Jesús Ignacio Mazadiego Cid ,&nbsp;María del Rosario Herrero Maceda ,&nbsp;Paloma Montserrat Diego Salazar ,&nbsp;Rogelio Zapata Arenas ,&nbsp;Scherezada María Isabel Mejía Loza ,&nbsp;Juanita Pérez Escobar ,&nbsp;María Fátima Higuera de la Tijera ,&nbsp;Elías Artemio San Vicente Parada ,&nbsp;Raquel Yazmín López Pérez ,&nbsp;Felipe Zamarripa Dorsey ,&nbsp;Yoali Maribel Velasco Santiago ,&nbsp;Adriana López Luria ,&nbsp;Moises Coutiño Flores ,&nbsp;Alejandra Díaz García","doi":"10.1016/j.aohep.2025.102032","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction and Objectives</h3><div>Evidence regarding the utility of artificial intelligences (AI) for the diagnosis of clinical cases in gastroenterology is limited, and is even scarcer in hepatology.</div><div>Determine the concordance between the responses of various AI models and those of specialist physicians in the resolution of hepatology clinical cases.</div></div><div><h3>Materials and Methods</h3><div>This was a clinical, observational, analytical, and prospective study. The assessment instrument comprised six hepatology clinical cases, each featuring five questions. A panel of eight experts from different institutions was convened; and their individual responses were subjected to calculation of the kappa coefficient (κ) and Cronbach’s alpha. Items that failed to meet the validation threshold (≥ 80 % agreement and κ ≥ 0.6) were reviewed through iterative rounds of a modified Delphi method. Finally, κ was calculated to evaluate concordance between responses generated by the AI models and the expert consensus.</div></div><div><h3>Results</h3><div>The expert consensus demonstrated a high overall concordance (κ = 0.901; 95 % CI [0.860, 0.943]; z = 61.57; p &lt; 0.001). Individual model concordance ranged from moderate to substantial, with κ values between 0.539 (Meditron-7B) and 0.784 (ChatGPT-4.0 and ChatGPT-4.0 Turbo), all statistically significant. In terms of the percentage of correct responses, the highest performing models were ChatGPT-4.0, ChatGPT-4.0 Turbo, and Deepseek-R1 (figure 1).</div></div><div><h3>Conclusions</h3><div>A moderate to substantial concordance was observed between diagnoses generated by different AI models and expert judgment in hepatology clinical cases, although variations were noted among the evaluated systems.</div></div>","PeriodicalId":7979,"journal":{"name":"Annals of hepatology","volume":"30 ","pages":"Article 102032"},"PeriodicalIF":4.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of hepatology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1665268125002571","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction and Objectives

Evidence regarding the utility of artificial intelligences (AI) for the diagnosis of clinical cases in gastroenterology is limited, and is even scarcer in hepatology.
Determine the concordance between the responses of various AI models and those of specialist physicians in the resolution of hepatology clinical cases.

Materials and Methods

This was a clinical, observational, analytical, and prospective study. The assessment instrument comprised six hepatology clinical cases, each featuring five questions. A panel of eight experts from different institutions was convened; and their individual responses were subjected to calculation of the kappa coefficient (κ) and Cronbach’s alpha. Items that failed to meet the validation threshold (≥ 80 % agreement and κ ≥ 0.6) were reviewed through iterative rounds of a modified Delphi method. Finally, κ was calculated to evaluate concordance between responses generated by the AI models and the expert consensus.

Results

The expert consensus demonstrated a high overall concordance (κ = 0.901; 95 % CI [0.860, 0.943]; z = 61.57; p < 0.001). Individual model concordance ranged from moderate to substantial, with κ values between 0.539 (Meditron-7B) and 0.784 (ChatGPT-4.0 and ChatGPT-4.0 Turbo), all statistically significant. In terms of the percentage of correct responses, the highest performing models were ChatGPT-4.0, ChatGPT-4.0 Turbo, and Deepseek-R1 (figure 1).

Conclusions

A moderate to substantial concordance was observed between diagnoses generated by different AI models and expert judgment in hepatology clinical cases, although variations were noted among the evaluated systems.
胃肠病学专家与人工智能工具在解决肝病临床病例中的一致性
关于人工智能(AI)在胃肠病学临床病例诊断中的应用的证据是有限的,在肝病学中甚至更少。确定各种人工智能模型的反应与专科医生在肝病临床病例解决中的反应之间的一致性。材料和方法这是一项临床、观察、分析和前瞻性研究。评估工具包括六个肝病临床病例,每个病例有五个问题。会议召集了一个由来自不同机构的八名专家组成的小组;分别计算kappa系数(κ)和Cronbach’s alpha。未达到验证阈值(≥80%一致性和κ≥0.6)的项目通过改进的德尔菲法的迭代轮进行审查。最后,计算κ以评估AI模型生成的响应与专家共识之间的一致性。结果专家共识总体一致性高(κ = 0.901;95% CI [0.860, 0.943]; z = 61.57;p < 0.001)。个体模型一致性从中等到相当,κ值在0.539 (medittron - 7b)和0.784 (ChatGPT-4.0和ChatGPT-4.0 Turbo)之间,均具有统计学意义。就正确响应的百分比而言,性能最高的模型是ChatGPT-4.0、ChatGPT-4.0 Turbo和Deepseek-R1(图1)。结论不同人工智能模型的诊断结果与肝病临床病例的专家判断基本一致,但不同评估系统之间存在差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of hepatology
Annals of hepatology 医学-胃肠肝病学
CiteScore
7.90
自引率
2.60%
发文量
183
审稿时长
4-8 weeks
期刊介绍: Annals of Hepatology publishes original research on the biology and diseases of the liver in both humans and experimental models. Contributions may be submitted as regular articles. The journal also publishes concise reviews of both basic and clinical topics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信