{"title":"Towards Graph Representation Learning Based Surgical Workflow Anticipation","authors":"Xiatian Zhang, N. A. Moubayed, Hubert P. H. Shum","doi":"10.1109/BHI56158.2022.9926801","DOIUrl":null,"url":null,"abstract":"Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, current approaches are limited to their insufficient expressive power for relationships between instruments. Hence, we propose a graph representation learning framework to comprehensively represent instrument motions in the surgical workflow anticipation problem. In our proposed graph representation, we maps the bounding box information of instruments to the graph nodes in the consecutive frames and build inter-frame/inter-instrument graph edges to represent the trajectory and interaction of the instruments over time. This design enhances the ability of our network on modeling both the spatial and temporal patterns of surgical instruments and their interactions. In addition, we design a multi-horizon learning strategy to balance the understanding of various horizons indifferent anticipation tasks, which significantly improves the model performance in anticipation with various horizons. Experiments on the Cholec80 dataset demonstrate the performance of our proposed method can exceed the state-of-the-art method based on richer backbones, especially in instrument anticipation (1.27 v.s. 1.48 for inMAE; 1.48 v.s. 2.68 for eMAE). To the best of our knowledge, we are the first to introduce a spatial-temporal graph representation into surgical workflow anticipation.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, current approaches are limited to their insufficient expressive power for relationships between instruments. Hence, we propose a graph representation learning framework to comprehensively represent instrument motions in the surgical workflow anticipation problem. In our proposed graph representation, we maps the bounding box information of instruments to the graph nodes in the consecutive frames and build inter-frame/inter-instrument graph edges to represent the trajectory and interaction of the instruments over time. This design enhances the ability of our network on modeling both the spatial and temporal patterns of surgical instruments and their interactions. In addition, we design a multi-horizon learning strategy to balance the understanding of various horizons indifferent anticipation tasks, which significantly improves the model performance in anticipation with various horizons. Experiments on the Cholec80 dataset demonstrate the performance of our proposed method can exceed the state-of-the-art method based on richer backbones, especially in instrument anticipation (1.27 v.s. 1.48 for inMAE; 1.48 v.s. 2.68 for eMAE). To the best of our knowledge, we are the first to introduce a spatial-temporal graph representation into surgical workflow anticipation.
手术工作流程预测可以预测下一步要进行哪些步骤或使用哪些仪器,这是计算机辅助手术干预系统的重要组成部分,例如机器人手术中的工作流程推理。然而,目前的方法仅限于它们对乐器之间关系的表达能力不足。因此,我们提出了一个图表示学习框架来全面表示手术工作流程预测问题中的器械运动。在我们提出的图表示中,我们将仪器的边界框信息映射到连续帧中的图节点,并构建帧间/仪器间的图边来表示仪器随时间的轨迹和相互作用。该设计增强了我们的网络对手术器械及其相互作用的空间和时间模式建模的能力。此外,我们设计了一种多视界学习策略来平衡对不同视界无关预期任务的理解,显著提高了模型在不同视界预期中的性能。在Cholec80数据集上的实验表明,我们提出的方法的性能可以超过基于更丰富主干的最先进方法,特别是在仪器预测方面(inMAE为1.27 vs . 1.48;1.48 vs . 2.68 eMAE)。据我们所知,我们是第一个将时空图表示引入手术工作流程预测的人。