PitVis-2023 challenge: Workflow recognition in videos of endoscopic pituitary surgery

IF 11.8 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-07-23 DOI:10.1016/j.media.2025.103716

Adrito Das , Danyal Z. Khan , Dimitrios Psychogyios , Yitong Zhang , John G. Hanrahan , Francisco Vasconcelos , You Pang , Zhen Chen , Jinlin Wu , Xiaoyang Zou , Guoyan Zheng , Abdul Qayyum , Moona Mazher , Imran Razzak , Tianbin Li , Jin Ye , Junjun He , Szymon Płotka , Joanna Kaleta , Amine Yamlahi , Sophia Bano

{"title":"PitVis-2023 challenge: Workflow recognition in videos of endoscopic pituitary surgery","authors":"Adrito Das , Danyal Z. Khan , Dimitrios Psychogyios , Yitong Zhang , John G. Hanrahan , Francisco Vasconcelos , You Pang , Zhen Chen , Jinlin Wu , Xiaoyang Zou , Guoyan Zheng , Abdul Qayyum , Moona Mazher , Imran Razzak , Tianbin Li , Jin Ye , Junjun He , Szymon Płotka , Joanna Kaleta , Amine Yamlahi , Sophia Bano","doi":"10.1016/j.media.2025.103716","DOIUrl":null,"url":null,"abstract":"<div><div>The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery, including: which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery or during live surgery. The Pituitary Vision (PitVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery. This is a particularly challenging task when compared to other minimally invasive surgeries due to: the smaller working space, which limits and distorts vision; and higher frequency of instrument and step switching, which requires more precise model predictions. Participants were provided with 25-videos, with results presented at the MICCAI-2023 conference as part of the Endoscopic Vision 2023 Challenge in Vancouver, Canada, on 08-Oct-2023. There were 18-submissions from 9-teams across 6-countries, using a variety of deep learning models. The top performing model for step recognition utilised a transformer based architecture, uniquely using an autoregressive decoder with a positional encoding input. The top performing model for instrument recognition utilised a spatial encoder followed by a temporal encoder, which uniquely used a 2-layer temporal architecture. In both cases, these models outperformed purely spatial based models, illustrating the importance of sequential and temporal information. This PitVis-2023 therefore demonstrates state-of-the-art computer vision models in minimally invasive surgery are transferable to a new dataset. Benchmark results are provided in the paper, and the dataset is publicly available at: <span><span>https://doi.org/10.5522/04/26531686</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"107 ","pages":"Article 103716"},"PeriodicalIF":11.8000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525002634","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery, including: which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery or during live surgery. The Pituitary Vision (PitVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery. This is a particularly challenging task when compared to other minimally invasive surgeries due to: the smaller working space, which limits and distorts vision; and higher frequency of instrument and step switching, which requires more precise model predictions. Participants were provided with 25-videos, with results presented at the MICCAI-2023 conference as part of the Endoscopic Vision 2023 Challenge in Vancouver, Canada, on 08-Oct-2023. There were 18-submissions from 9-teams across 6-countries, using a variety of deep learning models. The top performing model for step recognition utilised a transformer based architecture, uniquely using an autoregressive decoder with a positional encoding input. The top performing model for instrument recognition utilised a spatial encoder followed by a temporal encoder, which uniquely used a 2-layer temporal architecture. In both cases, these models outperformed purely spatial based models, illustrating the importance of sequential and temporal information. This PitVis-2023 therefore demonstrates state-of-the-art computer vision models in minimally invasive surgery are transferable to a new dataset. Benchmark results are provided in the paper, and the dataset is publicly available at: https://doi.org/10.5522/04/26531686.

查看原文本刊更多论文

PitVis-2023挑战：垂体内窥镜手术视频中的工作流程识别

计算机视觉在微创手术视频中的应用日益广泛。工作流程识别涉及对手术各个方面的自动识别，包括：执行哪些手术步骤；以及使用哪些手术器械。这些信息可以用来帮助临床医生学习手术或在现场手术。垂体视觉（PitVis） 2023挑战赛要求社区在垂体内窥镜手术视频中进行步骤和仪器识别。与其他微创手术相比，这是一项特别具有挑战性的任务，因为：工作空间较小，会限制和扭曲视力；仪器和阶跃切换的频率更高，这需要更精确的模型预测。为参与者提供了25个视频，其结果将于2023年10月8日在加拿大温哥华举行的MICCAI-2023会议上公布，作为内窥镜视觉2023挑战的一部分。来自6个国家的9个团队提交了18份作品，使用了各种深度学习模型。阶跃识别的最佳表现模型利用了基于变压器的架构，独特地使用了带有位置编码输入的自回归解码器。表现最好的仪器识别模型使用了一个空间编码器，然后是一个时间编码器，它独特地使用了2层时间架构。在这两种情况下，这些模型都优于纯粹基于空间的模型，说明了序列和时间信息的重要性。因此，PitVis-2023展示了微创手术中最先进的计算机视觉模型可以转移到新的数据集。基准测试结果在论文中提供，数据集可在：https://doi.org/10.5522/04/26531686上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.