胸腔镜二尖瓣成形术的新数据集和多任务外科工作流程分析框架

IF 11.8 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-07-21 DOI:10.1016/j.media.2025.103724

Meng Lan , Weixin Si , Xinjian Yan , Xiaomeng Li

{"title":"胸腔镜二尖瓣成形术的新数据集和多任务外科工作流程分析框架","authors":"Meng Lan , Weixin Si , Xinjian Yan , Xiaomeng Li","doi":"10.1016/j.media.2025.103724","DOIUrl":null,"url":null,"abstract":"<div><div>Surgical Workflow Analysis (SWA) on videos is critical for AI-assisted intelligent surgery. Existing SWA methods primarily focus on laparoscopic surgeries, while research on complex thoracoscopy-assisted cardiac surgery remains largely unexplored. In this paper, we introduce <strong>TMVP-SurgVideo</strong>, the first SWA video dataset for thoracoscopic cardiac mitral valvuloplasty (TMVP). TMVP-SurgVideo comprises 57 independent long-form surgical videos and over 429K annotated frames, covering four key tasks, namely phase and instrument recognitions, and phase and instrument anticipations. To achieve a comprehensive SWA system for TMVP and overcome the limitations of current SWA methods, we propose <strong>SurgFormer</strong>, the first query-based Transformer framework that simultaneously performs recognition and anticipation of surgical phases and instruments. SurgFormer uses four low-dimensional learnable task embeddings to independently decode representation embeddings for the predictions of the four tasks. During the decoding process, an information interaction module that contains the intra-frame task-level information interaction layer and the inter-frame temporal correlation learning layer is devised to operate on the task embeddings, enabling the information collaboration between tasks within each frame and temporal correlation learning of each task across frames. Besides, SurgFormer’s unique architecture allows it to perform both offline and online inferences using a dynamic memory bank without model modification. Our proposed SurgFormer is evaluated on the TMVP-SurgVideo and existing Cholec80 datasets to demonstrate its effectiveness on SWA. The dataset and the code are available at <span><span>https://github.com/xmed-lab/SurgFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"105 ","pages":"Article 103724"},"PeriodicalIF":11.8000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new dataset and versatile multi-task surgical workflow analysis framework for thoracoscopic mitral valvuloplasty\",\"authors\":\"Meng Lan , Weixin Si , Xinjian Yan , Xiaomeng Li\",\"doi\":\"10.1016/j.media.2025.103724\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Surgical Workflow Analysis (SWA) on videos is critical for AI-assisted intelligent surgery. Existing SWA methods primarily focus on laparoscopic surgeries, while research on complex thoracoscopy-assisted cardiac surgery remains largely unexplored. In this paper, we introduce <strong>TMVP-SurgVideo</strong>, the first SWA video dataset for thoracoscopic cardiac mitral valvuloplasty (TMVP). TMVP-SurgVideo comprises 57 independent long-form surgical videos and over 429K annotated frames, covering four key tasks, namely phase and instrument recognitions, and phase and instrument anticipations. To achieve a comprehensive SWA system for TMVP and overcome the limitations of current SWA methods, we propose <strong>SurgFormer</strong>, the first query-based Transformer framework that simultaneously performs recognition and anticipation of surgical phases and instruments. SurgFormer uses four low-dimensional learnable task embeddings to independently decode representation embeddings for the predictions of the four tasks. During the decoding process, an information interaction module that contains the intra-frame task-level information interaction layer and the inter-frame temporal correlation learning layer is devised to operate on the task embeddings, enabling the information collaboration between tasks within each frame and temporal correlation learning of each task across frames. Besides, SurgFormer’s unique architecture allows it to perform both offline and online inferences using a dynamic memory bank without model modification. Our proposed SurgFormer is evaluated on the TMVP-SurgVideo and existing Cholec80 datasets to demonstrate its effectiveness on SWA. The dataset and the code are available at <span><span>https://github.com/xmed-lab/SurgFormer</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":18328,\"journal\":{\"name\":\"Medical image analysis\",\"volume\":\"105 \",\"pages\":\"Article 103724\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical image analysis\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1361841525002713\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525002713","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于视频的手术流程分析（SWA）对于人工智能辅助的智能手术至关重要。现有的SWA方法主要集中在腹腔镜手术，而复杂的胸腔镜辅助心脏手术的研究仍未得到充分的探索。在本文中，我们介绍了TMVP- surgvideo，这是第一个用于胸腔镜心脏二尖瓣成形术（TMVP）的SWA视频数据集。TMVP-SurgVideo包括57个独立的长篇手术视频和超过429K的注释帧，涵盖了四个关键任务，即阶段和器械识别，以及阶段和器械预测。为了实现TMVP的综合SWA系统并克服当前SWA方法的局限性，我们提出了SurgFormer，这是第一个基于查询的Transformer框架，同时执行手术阶段和器械的识别和预测。SurgFormer使用四个低维可学习任务嵌入来独立解码四个任务的预测表示嵌入。在解码过程中，设计了包含帧内任务级信息交互层和帧间时间相关学习层的信息交互模块对任务嵌入进行操作，实现了每帧内任务间的信息协作和跨帧各任务的时间相关学习。此外，SurgFormer独特的架构允许它使用动态内存库进行离线和在线推理，而无需修改模型。我们提出的SurgFormer在TMVP-SurgVideo和现有的Cholec80数据集上进行了评估，以证明其对SWA的有效性。数据集和代码可在https://github.com/xmed-lab/SurgFormer上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A new dataset and versatile multi-task surgical workflow analysis framework for thoracoscopic mitral valvuloplasty

Surgical Workflow Analysis (SWA) on videos is critical for AI-assisted intelligent surgery. Existing SWA methods primarily focus on laparoscopic surgeries, while research on complex thoracoscopy-assisted cardiac surgery remains largely unexplored. In this paper, we introduce TMVP-SurgVideo, the first SWA video dataset for thoracoscopic cardiac mitral valvuloplasty (TMVP). TMVP-SurgVideo comprises 57 independent long-form surgical videos and over 429K annotated frames, covering four key tasks, namely phase and instrument recognitions, and phase and instrument anticipations. To achieve a comprehensive SWA system for TMVP and overcome the limitations of current SWA methods, we propose SurgFormer, the first query-based Transformer framework that simultaneously performs recognition and anticipation of surgical phases and instruments. SurgFormer uses four low-dimensional learnable task embeddings to independently decode representation embeddings for the predictions of the four tasks. During the decoding process, an information interaction module that contains the intra-frame task-level information interaction layer and the inter-frame temporal correlation learning layer is devised to operate on the task embeddings, enabling the information collaboration between tasks within each frame and temporal correlation learning of each task across frames. Besides, SurgFormer’s unique architecture allows it to perform both offline and online inferences using a dynamic memory bank without model modification. Our proposed SurgFormer is evaluated on the TMVP-SurgVideo and existing Cholec80 datasets to demonstrate its effectiveness on SWA. The dataset and the code are available at https://github.com/xmed-lab/SurgFormer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.