Masked pre-training of transformers for histology image analysis

Q2 Medicine

Journal of Pathology Informatics Pub Date : 2024-05-31 DOI:10.1016/j.jpi.2024.100386

Shuai Jiang , Liesbeth Hondelink , Arief A. Suriawinata , Saeed Hassanpour

{"title":"Masked pre-training of transformers for histology image analysis","authors":"Shuai Jiang , Liesbeth Hondelink , Arief A. Suriawinata , Saeed Hassanpour","doi":"10.1016/j.jpi.2024.100386","DOIUrl":null,"url":null,"abstract":"<div><p>In digital pathology, whole-slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Vision transformer (ViT) models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. In this study, we propose a pretext task to train the transformer model in a self-supervised manner. Our model, MaskHIT, uses the transformer output to reconstruct masked patches, measured by contrastive loss. We pre-trained MaskHIT model using over 7000 WSIs from TCGA and extensively evaluated its performance in multiple experiments, covering survival prediction, cancer subtype classification, and grade prediction tasks. Our experiments demonstrate that the pre-training procedure enables context-aware understanding of WSIs, facilitates the learning of representative histological features based on patch positions and visual patterns, and is essential for the ViT model to achieve optimal results on WSI-level tasks. The pre-trained MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, and also outperforms recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures on the whole slide for each task.</p></div>","PeriodicalId":37769,"journal":{"name":"Journal of Pathology Informatics","volume":"15 ","pages":"Article 100386"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2153353924000257/pdfft?md5=3dfddd9f11d8384fd0c39d65dbfab6b4&pid=1-s2.0-S2153353924000257-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2153353924000257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

In digital pathology, whole-slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Vision transformer (ViT) models have recently emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. However, due to the large number of model parameters and limited labeled data, applying transformer models to WSIs remains challenging. In this study, we propose a pretext task to train the transformer model in a self-supervised manner. Our model, MaskHIT, uses the transformer output to reconstruct masked patches, measured by contrastive loss. We pre-trained MaskHIT model using over 7000 WSIs from TCGA and extensively evaluated its performance in multiple experiments, covering survival prediction, cancer subtype classification, and grade prediction tasks. Our experiments demonstrate that the pre-training procedure enables context-aware understanding of WSIs, facilitates the learning of representative histological features based on patch positions and visual patterns, and is essential for the ViT model to achieve optimal results on WSI-level tasks. The pre-trained MaskHIT surpasses various multiple instance learning approaches by 3% and 2% on survival prediction and cancer subtype classification tasks, and also outperforms recent state-of-the-art transformer-based methods. Finally, a comparison between the attention maps generated by the MaskHIT model with pathologist's annotations indicates that the model can accurately identify clinically relevant histological structures on the whole slide for each task.

查看原文本刊更多论文

用于组织学图像分析的变换器屏蔽预训练

在数字病理学中，整幅图像（WSI）被广泛应用于癌症诊断和预后预测等领域。视觉变换器（ViT）模型是最近出现的一种很有前途的方法，它可以对大区域的 WSIs 进行编码，同时保留斑块之间的空间关系。然而，由于模型参数较多且标注数据有限，将变换器模型应用于 WSIs 仍然具有挑战性。在本研究中，我们提出了一个借口任务，以自我监督的方式训练变换器模型。我们的模型 MaskHIT 使用变换器输出来重构被遮蔽的斑块，以对比度损失来衡量。我们使用 TCGA 的 7000 多个 WSI 对 MaskHIT 模型进行了预训练，并在多个实验中对其性能进行了广泛评估，包括生存预测、癌症亚型分类和等级预测任务。我们的实验证明，预训练程序能够实现对 WSI 的上下文感知理解，促进了基于斑块位置和视觉模式的代表性组织学特征的学习，对于 ViT 模型在 WSI 级别任务中取得最佳结果至关重要。在生存预测和癌症亚型分类任务上，预训练的 MaskHIT 比各种多实例学习方法分别高出 3% 和 2%，也优于最近最先进的基于变换器的方法。最后，将 MaskHIT 模型生成的注意图与病理学家的注释进行比较，结果表明该模型能在每项任务中准确识别整张幻灯片上与临床相关的组织结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Pathology Informatics Medicine-Pathology and Forensic Medicine

CiteScore

3.70

自引率

0.00%

发文量

审稿时长

18 weeks

期刊介绍： The Journal of Pathology Informatics (JPI) is an open access peer-reviewed journal dedicated to the advancement of pathology informatics. This is the official journal of the Association for Pathology Informatics (API). The journal aims to publish broadly about pathology informatics and freely disseminate all articles worldwide. This journal is of interest to pathologists, informaticians, academics, researchers, health IT specialists, information officers, IT staff, vendors, and anyone with an interest in informatics. We encourage submissions from anyone with an interest in the field of pathology informatics. We publish all types of papers related to pathology informatics including original research articles, technical notes, reviews, viewpoints, commentaries, editorials, symposia, meeting abstracts, book reviews, and correspondence to the editors. All submissions are subject to rigorous peer review by the well-regarded editorial board and by expert referees in appropriate specialties.