Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Complex & Intelligent Systems Pub Date : 2025-03-21 DOI:10.1007/s40747-025-01779-y

Bin Yang, Lei Ding, Jianqiang Li, Yong Li, Guangzhi Qu, Jingyi Wang, Qiang Wang, Bo Liu

{"title":"Transformer-based multiple instance learning network with 2D positional encoding for histopathology image classification","authors":"Bin Yang, Lei Ding, Jianqiang Li, Yong Li, Guangzhi Qu, Jingyi Wang, Qiang Wang, Bo Liu","doi":"10.1007/s40747-025-01779-y","DOIUrl":null,"url":null,"abstract":"<p>Digital medical imaging, particularly pathology images, is essential for cancer diagnosis but faces challenges in direct model training due to its super-resolution nature. Although weakly supervised learning has reduced the need for manual annotations, many multiple instance learning (MIL) methods struggle to effectively capture crucial spatial relationships in histopathological images. Existing methods incorporating positional information often overlook nuanced spatial correlations or use positional encoding strategies that do not fully capture the unique spatial dynamics of pathology images. To address this issue, we propose a new framework named TMIL (Transformer-based Multiple Instance Learning Network with 2D positional encoding), which leverages multiple instance learning for weakly supervised classification of histopathological images. TMIL incorporates a 2D positional encoding module, based on the Transformer, to model positional information and explore correlations between instances. Furthermore, TMIL divides histopathological images into pseudo-bags and trains patch-level feature vectors with deep metric learning to enhance classification performance. Finally, the proposed approach is evaluated on a public colorectal adenoma dataset. The experimental results show that TMIL outperforms existing MIL methods, achieving an AUC of 97.28% and an ACC of 95.19%. These findings suggest that TMIL’s integration of deep metric learning and positional encoding offers a promising approach for improving the efficiency and accuracy of pathology image analysis in cancer diagnosis.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"56 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-025-01779-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Digital medical imaging, particularly pathology images, is essential for cancer diagnosis but faces challenges in direct model training due to its super-resolution nature. Although weakly supervised learning has reduced the need for manual annotations, many multiple instance learning (MIL) methods struggle to effectively capture crucial spatial relationships in histopathological images. Existing methods incorporating positional information often overlook nuanced spatial correlations or use positional encoding strategies that do not fully capture the unique spatial dynamics of pathology images. To address this issue, we propose a new framework named TMIL (Transformer-based Multiple Instance Learning Network with 2D positional encoding), which leverages multiple instance learning for weakly supervised classification of histopathological images. TMIL incorporates a 2D positional encoding module, based on the Transformer, to model positional information and explore correlations between instances. Furthermore, TMIL divides histopathological images into pseudo-bags and trains patch-level feature vectors with deep metric learning to enhance classification performance. Finally, the proposed approach is evaluated on a public colorectal adenoma dataset. The experimental results show that TMIL outperforms existing MIL methods, achieving an AUC of 97.28% and an ACC of 95.19%. These findings suggest that TMIL’s integration of deep metric learning and positional encoding offers a promising approach for improving the efficiency and accuracy of pathology image analysis in cancer diagnosis.

查看原文本刊更多论文

基于二维位置编码的多实例学习网络用于组织病理图像分类

数字医学成像，特别是病理图像，对癌症诊断至关重要，但由于其超分辨率的性质，在直接模型训练方面面临挑战。尽管弱监督学习减少了对人工注释的需求，但许多多实例学习（MIL）方法难以有效地捕获组织病理学图像中的关键空间关系。结合位置信息的现有方法经常忽略细微的空间相关性或使用位置编码策略，不能完全捕获病理图像的独特空间动态。为了解决这个问题，我们提出了一个名为TMIL（基于二维位置编码的基于变换的多实例学习网络）的新框架，它利用多实例学习对组织病理图像进行弱监督分类。TMIL结合了一个基于Transformer的2D位置编码模块，用于对位置信息建模并探索实例之间的相关性。此外，TMIL将组织病理图像划分为伪袋，并利用深度度量学习训练斑块级特征向量，以提高分类性能。最后，在公共结直肠腺瘤数据集上对所提出的方法进行了评估。实验结果表明，该方法的AUC为97.28%，ACC为95.19%，优于现有的MIL方法。这些研究结果表明，TMIL将深度度量学习和位置编码相结合，为提高肿瘤病理图像分析的效率和准确性提供了一种有希望的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.