Leveraging large language and vision models for knowledge extraction from large-scale image–text colonoscopy records

IF 26.8 1区医学 Q1 ENGINEERING, BIOMEDICAL

Nature Biomedical Engineering Pub Date : 2025-09-16 DOI:10.1038/s41551-025-01500-x

Shuo Wang, Yan Zhu, Zhiwei Yang, Xiaoyuan Luo, Yizhe Zhang, Peiyao Fu, Haoran Wang, Manning Wang, Zhijian Song, Quanlin Li, Pinghong Zhou, Yike Guo

{"title":"Leveraging large language and vision models for knowledge extraction from large-scale image–text colonoscopy records","authors":"Shuo Wang, Yan Zhu, Zhiwei Yang, Xiaoyuan Luo, Yizhe Zhang, Peiyao Fu, Haoran Wang, Manning Wang, Zhijian Song, Quanlin Li, Pinghong Zhou, Yike Guo","doi":"10.1038/s41551-025-01500-x","DOIUrl":null,"url":null,"abstract":"<p>The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalization. Image–text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, although annotating them is labour intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We apply EndoKED to multicentre datasets of raw colonoscopy records (~1 million images), showing its superior performance in detecting polyps at the report and image levels, as well as annotating polyps at the pixel level. The state-of-the-art performance and generalization ability of polyp segmentation models are achieved through EndoKED pretraining. Furthermore, the EndoKED vision backbone enables data-efficient learning for optical biopsy, achieving expert-level performance in internal, external and prospective validation datasets.</p>","PeriodicalId":19063,"journal":{"name":"Nature Biomedical Engineering","volume":"35 1","pages":""},"PeriodicalIF":26.8000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1038/s41551-025-01500-x","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalization. Image–text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, although annotating them is labour intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We apply EndoKED to multicentre datasets of raw colonoscopy records (~1 million images), showing its superior performance in detecting polyps at the report and image levels, as well as annotating polyps at the pixel level. The state-of-the-art performance and generalization ability of polyp segmentation models are achieved through EndoKED pretraining. Furthermore, the EndoKED vision backbone enables data-efficient learning for optical biopsy, achieving expert-level performance in internal, external and prospective validation datasets.

Abstract Image

查看原文本刊更多论文

利用大型语言和视觉模型从大规模图像-文本结肠镜检查记录中提取知识

用于结肠镜分析的人工智能系统的发展通常需要专家注释的图像数据集。然而，数据集大小和多样性的限制阻碍了模型的性能和泛化。来自常规临床实践的图像-文本结肠镜检查记录，包括数百万图像和文本报告，作为有价值的数据源，尽管注释它们是劳动密集型的。在这里，我们利用大型语言和视觉模型的最新进展，提出了EndoKED，这是一种用于深度知识提取和蒸馏的数据挖掘范式。EndoKED自动将原始结肠镜检查记录转换为具有像素级注释的图像数据集。我们将EndoKED应用于原始结肠镜记录的多中心数据集（约100万张图像），显示了其在报告和图像级别检测息肉以及在像素级别注释息肉方面的优越性能。通过EndoKED预训练，实现了息肉分割模型最先进的性能和泛化能力。此外，EndoKED视觉骨干支持光学活检的数据高效学习，在内部、外部和前瞻性验证数据集中实现专家级性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature Biomedical Engineering Medicine-Medicine (miscellaneous)

CiteScore

45.30

自引率

1.10%

发文量

138

期刊介绍： Nature Biomedical Engineering is an online-only monthly journal that was launched in January 2017. It aims to publish original research, reviews, and commentary focusing on applied biomedicine and health technology. The journal targets a diverse audience, including life scientists who are involved in developing experimental or computational systems and methods to enhance our understanding of human physiology. It also covers biomedical researchers and engineers who are engaged in designing or optimizing therapies, assays, devices, or procedures for diagnosing or treating diseases. Additionally, clinicians, who make use of research outputs to evaluate patient health or administer therapy in various clinical settings and healthcare contexts, are also part of the target audience.