Ángela González-Cebrián , Sara Bordonaba , Javier Pascau , Igor Paredes , Alfonso Lagares , Paula de Toledo
{"title":"Attention in surgical phase recognition for endoscopic pituitary surgery: Insights from real-world data","authors":"Ángela González-Cebrián , Sara Bordonaba , Javier Pascau , Igor Paredes , Alfonso Lagares , Paula de Toledo","doi":"10.1016/j.compbiomed.2025.110222","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>Surgical Phase Recognition systems are used to support the automated documentation of a procedure and to provide the surgical team with real-time feedback, potentially improving surgical outcome and reducing adverse events. The objective of this work is to develop a model for endoscopic pituitary surgery, a challenging procedure for phase recognition due to the high variability in the order of surgical phases.</div></div><div><h3>Methods</h3><div>A dataset of 69 pituitary endoscopic videos was collected and labelled by two surgeons in seven different phases. The architecture proposed comprises a Convolutional Neural Network to identify spatial features in individual frames, and a Segment Attentive Hierarchical Consistency Network (which combines Temporal Convolutional Networks with attention mechanisms) to learn temporal relationship information between frames and segments at different temporal scales. Finally, predictions are refined with an adaptative mode window.</div></div><div><h3>Results</h3><div>We have built and made publicly available the largest pituitary endoscopic surgery database to date, named PituPhase. We have built a model with a 73 % accuracy (75 % using a 10 s relaxed boundary). This result is comparable to other state-of-the-art methods in this surgical domain despite the challenges of the dataset (only 10 % of the videos are complete and only 3 % present all phases in the same order, versus 90 % and 50 % respectively in other studies).</div></div><div><h3>Conclusions</h3><div>Attention mechanisms in combination with Temporal Convolutional Networks and adaptive mode windows improve the performance of Surgical Phase Recognition systems and are robust to missing video sections and high variability in phase order.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"191 ","pages":"Article 110222"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525005736","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objective
Surgical Phase Recognition systems are used to support the automated documentation of a procedure and to provide the surgical team with real-time feedback, potentially improving surgical outcome and reducing adverse events. The objective of this work is to develop a model for endoscopic pituitary surgery, a challenging procedure for phase recognition due to the high variability in the order of surgical phases.
Methods
A dataset of 69 pituitary endoscopic videos was collected and labelled by two surgeons in seven different phases. The architecture proposed comprises a Convolutional Neural Network to identify spatial features in individual frames, and a Segment Attentive Hierarchical Consistency Network (which combines Temporal Convolutional Networks with attention mechanisms) to learn temporal relationship information between frames and segments at different temporal scales. Finally, predictions are refined with an adaptative mode window.
Results
We have built and made publicly available the largest pituitary endoscopic surgery database to date, named PituPhase. We have built a model with a 73 % accuracy (75 % using a 10 s relaxed boundary). This result is comparable to other state-of-the-art methods in this surgical domain despite the challenges of the dataset (only 10 % of the videos are complete and only 3 % present all phases in the same order, versus 90 % and 50 % respectively in other studies).
Conclusions
Attention mechanisms in combination with Temporal Convolutional Networks and adaptive mode windows improve the performance of Surgical Phase Recognition systems and are robust to missing video sections and high variability in phase order.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.