Evaluating generative AI models for explainable pathological feature extraction in lung adenocarcinoma grading assessment and prognostic model construction.

IF 12.5 2区医学 Q1 SURGERY

International journal of surgery Pub Date : 2025-05-28 DOI:10.1097/JS9.0000000000002507

Junyi Shen, Suyin Feng, Pengpeng Zhang, Chang Qi, Zaoqu Liu, Yuying Feng, Chunrong Dong, Zhenyu Xie, Wenyi Gan, Lingxuan Zhu, Weiming Mou, Dongqiang Zeng, Bufu Tang, Mingjia Xiao, Guangdi Chu, Quan Cheng, Jian Zhang, Shengkun Peng, Yifeng Bai, Hank Z H Wong, Aimin Jiang, Peng Luo, Anqi Lin

{"title":"Evaluating generative AI models for explainable pathological feature extraction in lung adenocarcinoma grading assessment and prognostic model construction.","authors":"Junyi Shen, Suyin Feng, Pengpeng Zhang, Chang Qi, Zaoqu Liu, Yuying Feng, Chunrong Dong, Zhenyu Xie, Wenyi Gan, Lingxuan Zhu, Weiming Mou, Dongqiang Zeng, Bufu Tang, Mingjia Xiao, Guangdi Chu, Quan Cheng, Jian Zhang, Shengkun Peng, Yifeng Bai, Hank Z H Wong, Aimin Jiang, Peng Luo, Anqi Lin","doi":"10.1097/JS9.0000000000002507","DOIUrl":null,"url":null,"abstract":"Background: Given the increasing prevalence of generative AI (GenAI) models, a systematically evaluation of their performance in lung adenocarcinoma histopathological assessment is crucial. This study aimed to evaluate and compare three visual-capable GenAI models (GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro) for lung adenocarcinoma histological pattern recognition and grading, as well as to explore prognostic prediction models based on GenAI feature extraction.Materials and methods: In this retrospective study, we analyzed 310 diagnostic slides from The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) database to evaluate GenAI models and to develop and internally validate machine learning-based prognostic models. For independent external validation, we utilized 95 and 87 slides from obtained different institutions. The primary endpoints comprised GenAI grading accuracy (area under the receiver operating characteristic curve, AUC) and stability (intraclass correlation coefficient, ICC). Secondary endpoints included developing and assessing machine learning-based prognostic models using GenAI-extracted features from the TCGA-LUAD dataset, evaluated by Concordance index (C-index).Results: Among the evaluated models, claude-3.5-Sonnet demonstrated the best overall performance, achieving high grading accuracy (average AUC = 0.823) with moderate stability (ICC = 0.585) The optimal machine learning-based prognostic model, developed using features extracted by Claude-3.5-Sonnet and integrating clinical variables, demonstrated good performance in both internal and external validations, yielding an average C-index of 0.715. Meta-analysis demonstrated that this prognostic model effectively stratified patients into risk groups, with the high-risk group showing significantly worse outcomes (Hazard ratio = 5.16, 95% confidence interval = 3.09-8.62).Conclusion: GenAI models demonstrated significant potential in lung adenocarcinoma pathology, with Claude-3.5-Sonnet exhibiting superior performance in grading prediction and robust prognostic capabilities. These findings indicate promising applications of AI in lung adenocarcinoma diagnosis and clinical management.","PeriodicalId":14401,"journal":{"name":"International journal of surgery","volume":" ","pages":""},"PeriodicalIF":12.5000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/JS9.0000000000002507","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Given the increasing prevalence of generative AI (GenAI) models, a systematically evaluation of their performance in lung adenocarcinoma histopathological assessment is crucial. This study aimed to evaluate and compare three visual-capable GenAI models (GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro) for lung adenocarcinoma histological pattern recognition and grading, as well as to explore prognostic prediction models based on GenAI feature extraction.

Materials and methods: In this retrospective study, we analyzed 310 diagnostic slides from The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) database to evaluate GenAI models and to develop and internally validate machine learning-based prognostic models. For independent external validation, we utilized 95 and 87 slides from obtained different institutions. The primary endpoints comprised GenAI grading accuracy (area under the receiver operating characteristic curve, AUC) and stability (intraclass correlation coefficient, ICC). Secondary endpoints included developing and assessing machine learning-based prognostic models using GenAI-extracted features from the TCGA-LUAD dataset, evaluated by Concordance index (C-index).

Results: Among the evaluated models, claude-3.5-Sonnet demonstrated the best overall performance, achieving high grading accuracy (average AUC = 0.823) with moderate stability (ICC = 0.585) The optimal machine learning-based prognostic model, developed using features extracted by Claude-3.5-Sonnet and integrating clinical variables, demonstrated good performance in both internal and external validations, yielding an average C-index of 0.715. Meta-analysis demonstrated that this prognostic model effectively stratified patients into risk groups, with the high-risk group showing significantly worse outcomes (Hazard ratio = 5.16, 95% confidence interval = 3.09-8.62).

Conclusion: GenAI models demonstrated significant potential in lung adenocarcinoma pathology, with Claude-3.5-Sonnet exhibiting superior performance in grading prediction and robust prognostic capabilities. These findings indicate promising applications of AI in lung adenocarcinoma diagnosis and clinical management.

查看原文本刊更多论文

评价生成人工智能模型在肺腺癌分级评估和预后模型构建中的可解释病理特征提取。

背景：鉴于生成人工智能（GenAI）模型的日益普及，系统评估其在肺腺癌组织病理学评估中的表现至关重要。本研究旨在评价和比较三种具有视觉功能的GenAI模型（gpt - 40、Claude-3.5-Sonnet和Gemini-1.5-Pro）在肺腺癌组织学模式识别和分级中的应用，并探讨基于GenAI特征提取的预后预测模型。材料和方法：在这项回顾性研究中，我们分析了来自癌症基因组图谱肺腺癌（TCGA-LUAD）数据库的310张诊断幻灯片，以评估GenAI模型，并开发和内部验证基于机器学习的预后模型。为了进行独立的外部验证，我们使用了来自不同机构的95和87张幻灯片。主要终点包括GenAI分级精度（受试者工作特征曲线下面积，AUC）和稳定性（类内相关系数，ICC）。次要终点包括使用从TCGA-LUAD数据集中提取的genai特征开发和评估基于机器学习的预后模型，并通过一致性指数（C-index）进行评估。结果：在评估的模型中，claude-3.5-Sonnet表现出最佳的整体性能，获得了较高的分级精度（平均AUC = 0.823）和中等的稳定性（ICC = 0.585）。基于claude-3.5-Sonnet提取的特征并整合临床变量开发的最优机器学习预测模型在内部和外部验证中都表现出良好的性能，平均c指数为0.715。meta分析表明，该预后模型有效地将患者划分为不同的危险组，高危组的预后明显较差（风险比= 5.16,95%可信区间= 3.09-8.62）。结论：GenAI模型在肺腺癌病理中显示出显著的潜力，Claude-3.5-Sonnet在分级预测和强大的预后能力方面表现出优越的性能。这些发现表明人工智能在肺腺癌诊断和临床治疗中的应用前景广阔。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of surgery SURGERY-

CiteScore

17.70

自引率

3.30%

发文量

审稿时长

6-12 weeks

期刊介绍： The International Journal of Surgery (IJS) has a broad scope, encompassing all surgical specialties. Its primary objective is to facilitate the exchange of crucial ideas and lines of thought between and across these specialties.By doing so, the journal aims to counter the growing trend of increasing sub-specialization, which can result in "tunnel-vision" and the isolation of significant surgical advancements within specific specialties.