Exploring the potential of large language models in identifying metabolic dysfunction-associated steatotic liver disease: A comparative study of non-invasive tests and artificial intelligence-generated responses

IF 6 2区医学 Q1 GASTROENTEROLOGY & HEPATOLOGY

Liver International Pub Date : 2024-11-11 DOI:10.1111/liv.16112

Wanying Wu, Yuhu Guo, Qi Li, Congzhuo Jia

{"title":"Exploring the potential of large language models in identifying metabolic dysfunction-associated steatotic liver disease: A comparative study of non-invasive tests and artificial intelligence-generated responses","authors":"Wanying Wu, Yuhu Guo, Qi Li, Congzhuo Jia","doi":"10.1111/liv.16112","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background and Aims</h3>\n \n <p>This study sought to assess the capabilities of large language models (LLMs) in identifying clinically significant metabolic dysfunction-associated steatotic liver disease (MASLD).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We included individuals from NHANES 2017–2018. The validity and reliability of MASLD diagnosis by GPT-3.5 and GPT-4 were quantitatively examined and compared with those of the Fatty Liver Index (FLI) and United States FLI (USFLI). A receiver operating characteristic curve was conducted to assess the accuracy of MASLD diagnosis via different scoring systems. Additionally, GPT-4V's potential in clinical diagnosis using ultrasound images from MASLD patients was evaluated to provide assessments of LLM capabilities in both textual and visual data interpretation.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>GPT-4 demonstrated comparable performance in MASLD diagnosis to FLI and USFLI with the AUROC values of .831 (95% CI .796–.867), .817 (95% CI .797–.837) and .827 (95% CI .807–.848), respectively. GPT-4 exhibited a trend of enhanced accuracy, clinical relevance and efficiency compared to GPT-3.5 based on clinician evaluation. Additionally, Pearson's <i>r</i> values between GPT-4 and FLI, as well as USFLI, were .718 and .695, respectively, indicating robust and moderate correlations. Moreover, GPT-4V showed potential in understanding characteristics from hepatic ultrasound imaging but exhibited limited interpretive accuracy in diagnosing MASLD compared to skilled radiologists.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>GPT-4 achieved performance comparable to traditional risk scores in diagnosing MASLD and exhibited improved convenience, versatility and the capacity to offer user-friendly outputs. The integration of GPT-4V highlights the capacities of LLMs in handling both textual and visual medical data, reinforcing their expansive utility in healthcare practice.</p>\n </section>\n </div>","PeriodicalId":18101,"journal":{"name":"Liver International","volume":"45 4","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Liver International","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/liv.16112","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Aims

This study sought to assess the capabilities of large language models (LLMs) in identifying clinically significant metabolic dysfunction-associated steatotic liver disease (MASLD).

Methods

We included individuals from NHANES 2017–2018. The validity and reliability of MASLD diagnosis by GPT-3.5 and GPT-4 were quantitatively examined and compared with those of the Fatty Liver Index (FLI) and United States FLI (USFLI). A receiver operating characteristic curve was conducted to assess the accuracy of MASLD diagnosis via different scoring systems. Additionally, GPT-4V's potential in clinical diagnosis using ultrasound images from MASLD patients was evaluated to provide assessments of LLM capabilities in both textual and visual data interpretation.

Results

GPT-4 demonstrated comparable performance in MASLD diagnosis to FLI and USFLI with the AUROC values of .831 (95% CI .796–.867), .817 (95% CI .797–.837) and .827 (95% CI .807–.848), respectively. GPT-4 exhibited a trend of enhanced accuracy, clinical relevance and efficiency compared to GPT-3.5 based on clinician evaluation. Additionally, Pearson's r values between GPT-4 and FLI, as well as USFLI, were .718 and .695, respectively, indicating robust and moderate correlations. Moreover, GPT-4V showed potential in understanding characteristics from hepatic ultrasound imaging but exhibited limited interpretive accuracy in diagnosing MASLD compared to skilled radiologists.

Conclusions

GPT-4 achieved performance comparable to traditional risk scores in diagnosing MASLD and exhibited improved convenience, versatility and the capacity to offer user-friendly outputs. The integration of GPT-4V highlights the capacities of LLMs in handling both textual and visual medical data, reinforcing their expansive utility in healthcare practice.

查看原文本刊更多论文

探索大语言模型在识别代谢功能障碍相关脂肪性肝病方面的潜力：非侵入性测试与人工智能生成反应的比较研究。

背景与目的本研究旨在评估大型语言模型（LLMs）在识别具有临床意义的代谢功能障碍相关性脂肪性肝病（MASLD）方面的能力：我们纳入了来自 2017-2018 年 NHANES 的个体。我们对 GPT-3.5 和 GPT-4 诊断 MASLD 的有效性和可靠性进行了定量研究，并与脂肪肝指数（FLI）和美国脂肪肝指数（USFLI）进行了比较。通过接收器操作特征曲线评估了不同评分系统对 MASLD 诊断的准确性。此外，还评估了 GPT-4V 在使用 MASLD 患者超声图像进行临床诊断方面的潜力，以评估 LLM 在文本和视觉数据解读方面的能力：GPT-4在MASLD诊断中的表现与FLI和USFLI相当，AUROC值分别为.831（95% CI .796-.867）、.817（95% CI .797-.837）和.827（95% CI .807-.848）。根据临床医生的评估，与 GPT-3.5 相比，GPT-4 在准确性、临床相关性和效率方面都有提高的趋势。此外，GPT-4 与 FLI 和 USFLI 之间的 Pearson's r 值分别为 0.718 和 0.695，表明两者之间存在稳健的中度相关性。此外，GPT-4V 在理解肝脏超声成像特征方面显示出潜力，但与熟练的放射科医生相比，在诊断 MASLD 方面显示出有限的解释准确性：结论：GPT-4 在诊断 MASLD 方面的表现可与传统的风险评分相媲美，而且更加方便、通用，并能提供用户友好的输出结果。GPT-4V 的整合凸显了 LLMs 处理文本和视觉医疗数据的能力，加强了其在医疗实践中的广泛用途。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Liver International 医学-胃肠肝病学

CiteScore

13.90

自引率

4.50%

发文量

348

审稿时长

2 months

期刊介绍： Liver International promotes all aspects of the science of hepatology from basic research to applied clinical studies. Providing an international forum for the publication of high-quality original research in hepatology, it is an essential resource for everyone working on normal and abnormal structure and function in the liver and its constituent cells, including clinicians and basic scientists involved in the multi-disciplinary field of hepatology. The journal welcomes articles from all fields of hepatology, which may be published as original articles, brief definitive reports, reviews, mini-reviews, images in hepatology and letters to the Editor.