SLDPC: Slide-Level Dual-Prompt Collaboration for few-shot whole slide image classification

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Computerized Medical Imaging and Graphics Pub Date : 2026-05-01 Epub Date: 2026-04-21 DOI:10.1016/j.compmedimag.2026.102768

Lulin Yuan , Yifeng Zheng , Weiqiang Liu , Hong Zhao , Wenjie Zhang , Baoya Wei , Liming Chen

{"title":"SLDPC: Slide-Level Dual-Prompt Collaboration for few-shot whole slide image classification","authors":"Lulin Yuan , Yifeng Zheng , Weiqiang Liu , Hong Zhao , Wenjie Zhang , Baoya Wei , Liming Chen","doi":"10.1016/j.compmedimag.2026.102768","DOIUrl":null,"url":null,"abstract":"<div><div>Digital pathology standardizes diagnostic workflows through the digitization of conventional slides and the integration of algorithmic analysis. Few-shot Weakly Supervised Whole Slide Image (WSI) Classification (FSWC) represents a critical challenge in digital pathology. Conventional Multiple Instance Learning (MIL) methods rely on large volumes of annotated data and are susceptible to distribution shifts. Vision-Language Model (VLM)-based prompt learning methods enable parameter-efficient few-shot learning but are limited to patch-level feature aggregation, failing to model slide-level diagnostic information. As slide-level information is crucial for understanding tissue architecture and lesion distribution, we propose a Slide-Level Dual-Prompt Collaboration (SLDPC) framework for the FSWC task. Specifically, SLDPC leverages the representation learning capability of a slide-level VLM to perform prompt tuning directly at the slide level. A base prompt <span><math><mi>P</mi></math></span> is first obtained through continuous prompt initialization training and subsequently cloned to derive a parallel prompt <span><math><msup><mrow><mi>P</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span>. In addition, bidirectional InfoNCE loss is employed to enhance feature-level alignment. During inference, a weighted fusion mechanism is introduced to combine both prompts and achieve efficient adaptation of slide-level multimodal representations. Experimental evaluation on four datasets validates the superiority of SLDPC. The results demonstrate that slide-level prompt learning effectively addresses FSWC challenges and improves model performance.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"131 ","pages":"Article 102768"},"PeriodicalIF":4.9000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895611126000716","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Digital pathology standardizes diagnostic workflows through the digitization of conventional slides and the integration of algorithmic analysis. Few-shot Weakly Supervised Whole Slide Image (WSI) Classification (FSWC) represents a critical challenge in digital pathology. Conventional Multiple Instance Learning (MIL) methods rely on large volumes of annotated data and are susceptible to distribution shifts. Vision-Language Model (VLM)-based prompt learning methods enable parameter-efficient few-shot learning but are limited to patch-level feature aggregation, failing to model slide-level diagnostic information. As slide-level information is crucial for understanding tissue architecture and lesion distribution, we propose a Slide-Level Dual-Prompt Collaboration (SLDPC) framework for the FSWC task. Specifically, SLDPC leverages the representation learning capability of a slide-level VLM to perform prompt tuning directly at the slide level. A base prompt

P

is first obtained through continuous prompt initialization training and subsequently cloned to derive a parallel prompt

P^{'}

. In addition, bidirectional InfoNCE loss is employed to enhance feature-level alignment. During inference, a weighted fusion mechanism is introduced to combine both prompts and achieve efficient adaptation of slide-level multimodal representations. Experimental evaluation on four datasets validates the superiority of SLDPC. The results demonstrate that slide-level prompt learning effectively addresses FSWC challenges and improves model performance.

查看原文本刊更多论文

SLDPC：幻灯片级双提示协作，用于少量全幻灯片图像分类。

数字病理学通过传统切片的数字化和算法分析的集成使诊断工作流程标准化。少量弱监督全幻灯片图像（WSI）分类（FSWC）是数字病理学中的一个关键挑战。传统的多实例学习（MIL）方法依赖于大量带注释的数据，并且容易受到分布变化的影响。基于视觉语言模型（VLM）的提示学习方法能够实现参数高效的少镜头学习，但仅限于补丁级特征聚合，无法对幻灯片级诊断信息进行建模。由于幻灯片级信息对于理解组织结构和病变分布至关重要，我们提出了一个用于FSWC任务的幻灯片级双提示协作（SLDPC）框架。具体来说，SLDPC利用幻灯片级VLM的表示学习能力，直接在幻灯片级执行提示调优。首先通过连续的提示符初始化训练得到一个基本提示符P，然后克隆得到一个并行提示符P'。此外，双向InfoNCE丢失被用来增强特性级的一致性。在推理过程中，引入加权融合机制，将两个提示信息结合起来，实现对滑动级多模态表示的有效自适应。在4个数据集上的实验评估验证了SLDPC的优越性。结果表明，滑动级提示学习有效地解决了FSWC挑战并提高了模型性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computerized Medical Imaging and Graphics 医学-核医学

CiteScore

10.70

自引率

3.50%

发文量

审稿时长

26 days

期刊介绍： The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.