CONCORDANCE BETWEEN EXPERT GASTROENTEROLOGISTS AND ARTIFICIAL INTELLIGENCE TOOLS IN SOLVING HEPATOLOGY CLINICAL CASES

IF 4.4 3区医学 Q2 GASTROENTEROLOGY & HEPATOLOGY

Annals of hepatology Pub Date : 2025-09-01 DOI:10.1016/j.aohep.2025.102032

Jesús Ignacio Mazadiego Cid , María del Rosario Herrero Maceda , Paloma Montserrat Diego Salazar , Rogelio Zapata Arenas , Scherezada María Isabel Mejía Loza , Juanita Pérez Escobar , María Fátima Higuera de la Tijera , Elías Artemio San Vicente Parada , Raquel Yazmín López Pérez , Felipe Zamarripa Dorsey , Yoali Maribel Velasco Santiago , Adriana López Luria , Moises Coutiño Flores , Alejandra Díaz García

{"title":"CONCORDANCE BETWEEN EXPERT GASTROENTEROLOGISTS AND ARTIFICIAL INTELLIGENCE TOOLS IN SOLVING HEPATOLOGY CLINICAL CASES","authors":"Jesús Ignacio Mazadiego Cid , María del Rosario Herrero Maceda , Paloma Montserrat Diego Salazar , Rogelio Zapata Arenas , Scherezada María Isabel Mejía Loza , Juanita Pérez Escobar , María Fátima Higuera de la Tijera , Elías Artemio San Vicente Parada , Raquel Yazmín López Pérez , Felipe Zamarripa Dorsey , Yoali Maribel Velasco Santiago , Adriana López Luria , Moises Coutiño Flores , Alejandra Díaz García","doi":"10.1016/j.aohep.2025.102032","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction and Objectives</h3><div>Evidence regarding the utility of artificial intelligences (AI) for the diagnosis of clinical cases in gastroenterology is limited, and is even scarcer in hepatology.</div><div>Determine the concordance between the responses of various AI models and those of specialist physicians in the resolution of hepatology clinical cases.</div></div><div><h3>Materials and Methods</h3><div>This was a clinical, observational, analytical, and prospective study. The assessment instrument comprised six hepatology clinical cases, each featuring five questions. A panel of eight experts from different institutions was convened; and their individual responses were subjected to calculation of the kappa coefficient (κ) and Cronbach’s alpha. Items that failed to meet the validation threshold (≥ 80 % agreement and κ ≥ 0.6) were reviewed through iterative rounds of a modified Delphi method. Finally, κ was calculated to evaluate concordance between responses generated by the AI models and the expert consensus.</div></div><div><h3>Results</h3><div>The expert consensus demonstrated a high overall concordance (κ = 0.901; 95 % CI [0.860, 0.943]; z = 61.57; p < 0.001). Individual model concordance ranged from moderate to substantial, with κ values between 0.539 (Meditron-7B) and 0.784 (ChatGPT-4.0 and ChatGPT-4.0 Turbo), all statistically significant. In terms of the percentage of correct responses, the highest performing models were ChatGPT-4.0, ChatGPT-4.0 Turbo, and Deepseek-R1 (figure 1).</div></div><div><h3>Conclusions</h3><div>A moderate to substantial concordance was observed between diagnoses generated by different AI models and expert judgment in hepatology clinical cases, although variations were noted among the evaluated systems.</div></div>","PeriodicalId":7979,"journal":{"name":"Annals of hepatology","volume":"30 ","pages":"Article 102032"},"PeriodicalIF":4.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of hepatology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1665268125002571","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction and Objectives

Evidence regarding the utility of artificial intelligences (AI) for the diagnosis of clinical cases in gastroenterology is limited, and is even scarcer in hepatology.

Determine the concordance between the responses of various AI models and those of specialist physicians in the resolution of hepatology clinical cases.

Materials and Methods

This was a clinical, observational, analytical, and prospective study. The assessment instrument comprised six hepatology clinical cases, each featuring five questions. A panel of eight experts from different institutions was convened; and their individual responses were subjected to calculation of the kappa coefficient (κ) and Cronbach’s alpha. Items that failed to meet the validation threshold (≥ 80 % agreement and κ ≥ 0.6) were reviewed through iterative rounds of a modified Delphi method. Finally, κ was calculated to evaluate concordance between responses generated by the AI models and the expert consensus.

Results

The expert consensus demonstrated a high overall concordance (κ = 0.901; 95 % CI [0.860, 0.943]; z = 61.57; p < 0.001). Individual model concordance ranged from moderate to substantial, with κ values between 0.539 (Meditron-7B) and 0.784 (ChatGPT-4.0 and ChatGPT-4.0 Turbo), all statistically significant. In terms of the percentage of correct responses, the highest performing models were ChatGPT-4.0, ChatGPT-4.0 Turbo, and Deepseek-R1 (figure 1).

Conclusions

A moderate to substantial concordance was observed between diagnoses generated by different AI models and expert judgment in hepatology clinical cases, although variations were noted among the evaluated systems.

查看原文本刊更多论文

胃肠病学专家与人工智能工具在解决肝病临床病例中的一致性

关于人工智能（AI）在胃肠病学临床病例诊断中的应用的证据是有限的，在肝病学中甚至更少。确定各种人工智能模型的反应与专科医生在肝病临床病例解决中的反应之间的一致性。材料和方法这是一项临床、观察、分析和前瞻性研究。评估工具包括六个肝病临床病例，每个病例有五个问题。会议召集了一个由来自不同机构的八名专家组成的小组；分别计算kappa系数（κ）和Cronbach’s alpha。未达到验证阈值（≥80%一致性和κ≥0.6）的项目通过改进的德尔菲法的迭代轮进行审查。最后，计算κ以评估AI模型生成的响应与专家共识之间的一致性。结果专家共识总体一致性高（κ = 0.901;95% CI [0.860, 0.943]; z = 61.57;p < 0.001）。个体模型一致性从中等到相当，κ值在0.539 （medittron - 7b）和0.784 （ChatGPT-4.0和ChatGPT-4.0 Turbo）之间，均具有统计学意义。就正确响应的百分比而言，性能最高的模型是ChatGPT-4.0、ChatGPT-4.0 Turbo和Deepseek-R1（图1）。结论不同人工智能模型的诊断结果与肝病临床病例的专家判断基本一致，但不同评估系统之间存在差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of hepatology 医学-胃肠肝病学

CiteScore

7.90

自引率

2.60%

发文量

183

审稿时长

4-8 weeks

期刊介绍： Annals of Hepatology publishes original research on the biology and diseases of the liver in both humans and experimental models. Contributions may be submitted as regular articles. The journal also publishes concise reviews of both basic and clinical topics.