Measuring and predicting where and when pathologists focus their visual attention while grading whole slide images of cancer

IF 11.8 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-08-28 DOI:10.1016/j.media.2025.103752

Souradeep Chakraborty , Ruoyu Xue , Rajarsi Gupta , Oksana Yaskiv , Constantin Friedman , Natallia Sheuka , Dana Perez , Paul Friedman , Won-Tak Choi , Waqas Mahmud , Beatrice Knudsen , Gregory Zelinsky , Joel Saltz , Dimitris Samaras

{"title":"Measuring and predicting where and when pathologists focus their visual attention while grading whole slide images of cancer","authors":"Souradeep Chakraborty , Ruoyu Xue , Rajarsi Gupta , Oksana Yaskiv , Constantin Friedman , Natallia Sheuka , Dana Perez , Paul Friedman , Won-Tak Choi , Waqas Mahmud , Beatrice Knudsen , Gregory Zelinsky , Joel Saltz , Dimitris Samaras","doi":"10.1016/j.media.2025.103752","DOIUrl":null,"url":null,"abstract":"<div><div>The ability to predict the attention of expert pathologists could lead to decision support systems for better pathology training. We developed methods to predict the spatio-temporal (“where” and “when”) movements of pathologists’ attention as they grade whole slide images (WSIs) of prostate cancer. We characterize a pathologist’s attention trajectory by their x, y, and m (magnification) movements of a viewport as they navigate WSIs using a digital microscope. This information was obtained from 43 pathologists across 123 WSIs, and we consider the task of predicting the pathologist attention scanpaths constructed from the viewport centers. We introduce a fixation extraction algorithm that simplifies an attention trajectory by extracting “fixations” in the pathologist’s viewing while preserving semantic information, and we use these pre-processed data to train and test a two-stage model to predict the dynamic (scanpath) allocation of attention during WSI reading via intermediate attention heatmap prediction. In the first stage, a transformer-based sub-network predicts the attention heatmaps (static attention) across different magnifications. In the second stage, we predict the attention scanpath by sequentially modeling the next fixation points in an autoregressive manner using a transformer-based approach, starting at the WSI center and leveraging multi-magnification feature representations from the first stage. Experimental results show that our scanpath prediction model outperforms chance and baseline models. Tools developed from this model could assist pathology trainees in learning to allocate their attention during WSI reading like an expert.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"107 ","pages":"Article 103752"},"PeriodicalIF":11.8000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525002993","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The ability to predict the attention of expert pathologists could lead to decision support systems for better pathology training. We developed methods to predict the spatio-temporal (“where” and “when”) movements of pathologists’ attention as they grade whole slide images (WSIs) of prostate cancer. We characterize a pathologist’s attention trajectory by their x, y, and m (magnification) movements of a viewport as they navigate WSIs using a digital microscope. This information was obtained from 43 pathologists across 123 WSIs, and we consider the task of predicting the pathologist attention scanpaths constructed from the viewport centers. We introduce a fixation extraction algorithm that simplifies an attention trajectory by extracting “fixations” in the pathologist’s viewing while preserving semantic information, and we use these pre-processed data to train and test a two-stage model to predict the dynamic (scanpath) allocation of attention during WSI reading via intermediate attention heatmap prediction. In the first stage, a transformer-based sub-network predicts the attention heatmaps (static attention) across different magnifications. In the second stage, we predict the attention scanpath by sequentially modeling the next fixation points in an autoregressive manner using a transformer-based approach, starting at the WSI center and leveraging multi-magnification feature representations from the first stage. Experimental results show that our scanpath prediction model outperforms chance and baseline models. Tools developed from this model could assist pathology trainees in learning to allocate their attention during WSI reading like an expert.

查看原文本刊更多论文

测量和预测病理学家在对整个癌症幻灯片图像进行分级时将视觉注意力集中在何处和何时

预测专家病理学家注意力的能力可以为更好的病理学培训提供决策支持系统。我们开发了方法来预测病理学家的注意力的时空（“在哪里”和“何时”）运动，因为他们对前列腺癌的整个幻灯片图像（wsi）进行评分。我们通过病理学家使用数字显微镜导航wsi时视口的x、y和m（放大倍率）运动来表征病理学家的注意力轨迹。这些信息来自123个wsi的43名病理学家，我们考虑的任务是预测从视口中心构建的病理学家注意扫描路径。我们引入了一种注视提取算法，该算法通过提取病理学家观察中的“注视”来简化注意轨迹，同时保留语义信息，我们使用这些预处理数据来训练和测试一个两阶段模型，通过中间注意热图预测来预测WSI阅读过程中注意力的动态（扫描路径）分配。在第一阶段，基于变压器的子网络预测不同放大倍数下的注意力热图（静态注意力）。在第二阶段，我们通过使用基于变压器的方法以自回归的方式依次对下一个注视点进行建模来预测注意力扫描路径，从WSI中心开始，并利用第一阶段的多放大特征表示。实验结果表明，我们的扫描路径预测模型优于机会模型和基线模型。从该模型开发的工具可以帮助病理学员学习在WSI阅读时像专家一样分配他们的注意力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.