Leveraging large language and vision models for knowledge extraction from large-scale image–text colonoscopy records

IF 26.8 1区 医学 Q1 ENGINEERING, BIOMEDICAL
Shuo Wang, Yan Zhu, Zhiwei Yang, Xiaoyuan Luo, Yizhe Zhang, Peiyao Fu, Haoran Wang, Manning Wang, Zhijian Song, Quanlin Li, Pinghong Zhou, Yike Guo
{"title":"Leveraging large language and vision models for knowledge extraction from large-scale image–text colonoscopy records","authors":"Shuo Wang, Yan Zhu, Zhiwei Yang, Xiaoyuan Luo, Yizhe Zhang, Peiyao Fu, Haoran Wang, Manning Wang, Zhijian Song, Quanlin Li, Pinghong Zhou, Yike Guo","doi":"10.1038/s41551-025-01500-x","DOIUrl":null,"url":null,"abstract":"<p>The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalization. Image–text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, although annotating them is labour intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We apply EndoKED to multicentre datasets of raw colonoscopy records (~1 million images), showing its superior performance in detecting polyps at the report and image levels, as well as annotating polyps at the pixel level. The state-of-the-art performance and generalization ability of polyp segmentation models are achieved through EndoKED pretraining. Furthermore, the EndoKED vision backbone enables data-efficient learning for optical biopsy, achieving expert-level performance in internal, external and prospective validation datasets.</p>","PeriodicalId":19063,"journal":{"name":"Nature Biomedical Engineering","volume":"35 1","pages":""},"PeriodicalIF":26.8000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1038/s41551-025-01500-x","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalization. Image–text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, although annotating them is labour intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We apply EndoKED to multicentre datasets of raw colonoscopy records (~1 million images), showing its superior performance in detecting polyps at the report and image levels, as well as annotating polyps at the pixel level. The state-of-the-art performance and generalization ability of polyp segmentation models are achieved through EndoKED pretraining. Furthermore, the EndoKED vision backbone enables data-efficient learning for optical biopsy, achieving expert-level performance in internal, external and prospective validation datasets.

Abstract Image

利用大型语言和视觉模型从大规模图像-文本结肠镜检查记录中提取知识
用于结肠镜分析的人工智能系统的发展通常需要专家注释的图像数据集。然而,数据集大小和多样性的限制阻碍了模型的性能和泛化。来自常规临床实践的图像-文本结肠镜检查记录,包括数百万图像和文本报告,作为有价值的数据源,尽管注释它们是劳动密集型的。在这里,我们利用大型语言和视觉模型的最新进展,提出了EndoKED,这是一种用于深度知识提取和蒸馏的数据挖掘范式。EndoKED自动将原始结肠镜检查记录转换为具有像素级注释的图像数据集。我们将EndoKED应用于原始结肠镜记录的多中心数据集(约100万张图像),显示了其在报告和图像级别检测息肉以及在像素级别注释息肉方面的优越性能。通过EndoKED预训练,实现了息肉分割模型最先进的性能和泛化能力。此外,EndoKED视觉骨干支持光学活检的数据高效学习,在内部、外部和前瞻性验证数据集中实现专家级性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Biomedical Engineering
Nature Biomedical Engineering Medicine-Medicine (miscellaneous)
CiteScore
45.30
自引率
1.10%
发文量
138
期刊介绍: Nature Biomedical Engineering is an online-only monthly journal that was launched in January 2017. It aims to publish original research, reviews, and commentary focusing on applied biomedicine and health technology. The journal targets a diverse audience, including life scientists who are involved in developing experimental or computational systems and methods to enhance our understanding of human physiology. It also covers biomedical researchers and engineers who are engaged in designing or optimizing therapies, assays, devices, or procedures for diagnosing or treating diseases. Additionally, clinicians, who make use of research outputs to evaluate patient health or administer therapy in various clinical settings and healthcare contexts, are also part of the target audience.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信