Clément Grisi , Kimmo Kartasalo , Martin Eklund , Lars Egevad , Jeroen van der Laak , Geert Litjens
{"title":"Hierarchical Vision Transformers for prostate biopsy grading: Towards bridging the generalization gap","authors":"Clément Grisi , Kimmo Kartasalo , Martin Eklund , Lars Egevad , Jeroen van der Laak , Geert Litjens","doi":"10.1016/j.media.2025.103663","DOIUrl":null,"url":null,"abstract":"<div><div>Practical deployment of Vision Transformers in computational pathology has largely been constrained by the sheer size of whole-slide images. Transformers faced a similar limitation when applied to long documents, and Hierarchical Transformers were introduced to circumvent it. This work explores the capabilities of Hierarchical Vision Transformers for prostate cancer grading in WSIs and presents a novel technique to combine attention scores smartly across hierarchical transformers. Our best-performing model matches state-of-the-art algorithms with a 0.916 quadratic kappa on the Prostate cANcer graDe Assessment (PANDA) test set. It exhibits superior generalization capacities when evaluated in more diverse clinical settings, achieving a quadratic kappa of 0.877, outperforming existing solutions. These results demonstrate our approach’s robustness and practical applicability, paving the way for its broader adoption in computational pathology and possibly other medical imaging tasks. Our code is publicly available at <span><span>https://github.com/computationalpathologygroup/hvit</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"105 ","pages":"Article 103663"},"PeriodicalIF":11.8000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525002105","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Practical deployment of Vision Transformers in computational pathology has largely been constrained by the sheer size of whole-slide images. Transformers faced a similar limitation when applied to long documents, and Hierarchical Transformers were introduced to circumvent it. This work explores the capabilities of Hierarchical Vision Transformers for prostate cancer grading in WSIs and presents a novel technique to combine attention scores smartly across hierarchical transformers. Our best-performing model matches state-of-the-art algorithms with a 0.916 quadratic kappa on the Prostate cANcer graDe Assessment (PANDA) test set. It exhibits superior generalization capacities when evaluated in more diverse clinical settings, achieving a quadratic kappa of 0.877, outperforming existing solutions. These results demonstrate our approach’s robustness and practical applicability, paving the way for its broader adoption in computational pathology and possibly other medical imaging tasks. Our code is publicly available at https://github.com/computationalpathologygroup/hvit.
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.