CLAV: clustering latent vector aggregation for whole slide image retrieval leveraging foundation models

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-12 DOI:10.1016/j.knosys.2025.114423

Alejandro Golfe , Pablo Meseguer , Valery Naranjo , Adrián Colomer

{"title":"CLAV: clustering latent vector aggregation for whole slide image retrieval leveraging foundation models","authors":"Alejandro Golfe , Pablo Meseguer , Valery Naranjo , Adrián Colomer","doi":"10.1016/j.knosys.2025.114423","DOIUrl":null,"url":null,"abstract":"<div><div>Content-Based Image Retrieval (CBIR) is crucial in cancer diagnosis, assisting pathologists by providing similar image data from previous records for analysis, especially when there is uncertainty in diagnosing a case. This process supports decision-making by providing valuable reference points to guide the diagnostic process. Foundation models have become increasingly important in the medical field due to their ability to generalize across various tasks and datasets, offering valuable support to pathologists by enhancing the accuracy and efficiency of diagnostic processes. In this article, a foundation model pre-trained on histopathology data is leveraged as a feature extractor without the need for task-specific training, in contrast to existing models that require extensive training to learn significant data representations. The proposed method, Clustering Latent Vector Aggregation (CLAV), condenses the significant feature vectors into a unique representative vector for the Whole Slide Image (WSI). Using a unique feature vector offers the advantage of reducing the size of the memory bank, thereby making the process of querying and retrieving similar WSIs more efficient. The experimental results presented in this study demonstrate that the proposed method enhances performance in CBIR tasks. This article highlights the potential of foundation models to achieve superior retrieval metrics compared to state-of-the-art methods specifically trained for CBIR.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"329 ","pages":"Article 114423"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125014625","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Content-Based Image Retrieval (CBIR) is crucial in cancer diagnosis, assisting pathologists by providing similar image data from previous records for analysis, especially when there is uncertainty in diagnosing a case. This process supports decision-making by providing valuable reference points to guide the diagnostic process. Foundation models have become increasingly important in the medical field due to their ability to generalize across various tasks and datasets, offering valuable support to pathologists by enhancing the accuracy and efficiency of diagnostic processes. In this article, a foundation model pre-trained on histopathology data is leveraged as a feature extractor without the need for task-specific training, in contrast to existing models that require extensive training to learn significant data representations. The proposed method, Clustering Latent Vector Aggregation (CLAV), condenses the significant feature vectors into a unique representative vector for the Whole Slide Image (WSI). Using a unique feature vector offers the advantage of reducing the size of the memory bank, thereby making the process of querying and retrieving similar WSIs more efficient. The experimental results presented in this study demonstrate that the proposed method enhances performance in CBIR tasks. This article highlights the potential of foundation models to achieve superior retrieval metrics compared to state-of-the-art methods specifically trained for CBIR.

查看原文本刊更多论文

clv：利用基础模型进行全幻灯片图像检索的聚类潜在向量聚合

基于内容的图像检索（CBIR）在癌症诊断中是至关重要的，它通过从以前的记录中提供类似的图像数据来帮助病理学家进行分析，特别是在诊断病例存在不确定性的情况下。该过程通过提供有价值的参考点来指导诊断过程，从而支持决策。基础模型在医学领域变得越来越重要，因为它们具有跨各种任务和数据集的泛化能力，通过提高诊断过程的准确性和效率为病理学家提供有价值的支持。在本文中，利用预先训练过的组织病理学数据的基础模型作为特征提取器，而不需要特定于任务的训练，与需要大量训练来学习重要数据表示的现有模型形成对比。本文提出的聚类潜在向量聚集（clv）方法，将重要特征向量浓缩为整个幻灯片图像（WSI）的唯一代表性向量。使用唯一的特征向量可以减少内存库的大小，从而使查询和检索类似wsi的过程更加高效。本研究的实验结果表明，该方法提高了CBIR任务的性能。本文强调了基础模型的潜力，与专门为CBIR训练的最先进的方法相比，它可以实现更好的检索度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.