平衡信息量和多样性的动态手术视频总结。

IF 9.8 1区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

IEEE Transactions on Medical Imaging Pub Date : 2025-09-03 DOI:10.1109/tmi.2025.3603170

Hao Wang,Yiyang Su,Min Xue,Xuefei Song,Cheng Yang,Lei Shi,Xianqun Fan,Shuai Ding

{"title":"平衡信息量和多样性的动态手术视频总结。","authors":"Hao Wang,Yiyang Su,Min Xue,Xuefei Song,Cheng Yang,Lei Shi,Xianqun Fan,Shuai Ding","doi":"10.1109/tmi.2025.3603170","DOIUrl":null,"url":null,"abstract":"Surgery video summarization can help medical professionals quickly gain the insight into the surgical process for the surgical education and skill evaluation. However, existing methods are unable to efficiently summarize information to satisfy medical professionals. Since it is challenging to summarize the video while balancing the information richness and diversity. In this paper, we propose a dynamic surgery video summarization framework (DSVS). We first used a multitask learning network to perceive and comprehend surgical action triplet components and phases. An information contribution module then measures the frame-level importance using the predicted triplets. A two-stage strategy which involves phase recognition and change-point detection further applied to divide each phase of the surgical videos into shots. Finally, A multi-objective zero-one programming model is formulated to select the optimal subset of shots by simultaneously maximizing intra-shot information contribution and minimizing inter-shot information similarity. Experimental results on two surgical video datasets show the framework can generate summaries that encompass crucial and diverse content. Clinical validations indicate the framework is capable of summarizing the information expected by surgeons. The source code can be found at https://github.com/syypretend/DSVS.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"36 1","pages":""},"PeriodicalIF":9.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic surgery video summarization with balancing informativeness and diversity.\",\"authors\":\"Hao Wang,Yiyang Su,Min Xue,Xuefei Song,Cheng Yang,Lei Shi,Xianqun Fan,Shuai Ding\",\"doi\":\"10.1109/tmi.2025.3603170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Surgery video summarization can help medical professionals quickly gain the insight into the surgical process for the surgical education and skill evaluation. However, existing methods are unable to efficiently summarize information to satisfy medical professionals. Since it is challenging to summarize the video while balancing the information richness and diversity. In this paper, we propose a dynamic surgery video summarization framework (DSVS). We first used a multitask learning network to perceive and comprehend surgical action triplet components and phases. An information contribution module then measures the frame-level importance using the predicted triplets. A two-stage strategy which involves phase recognition and change-point detection further applied to divide each phase of the surgical videos into shots. Finally, A multi-objective zero-one programming model is formulated to select the optimal subset of shots by simultaneously maximizing intra-shot information contribution and minimizing inter-shot information similarity. Experimental results on two surgical video datasets show the framework can generate summaries that encompass crucial and diverse content. Clinical validations indicate the framework is capable of summarizing the information expected by surgeons. The source code can be found at https://github.com/syypretend/DSVS.\",\"PeriodicalId\":13418,\"journal\":{\"name\":\"IEEE Transactions on Medical Imaging\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":9.8000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Medical Imaging\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/tmi.2025.3603170\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Medical Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/tmi.2025.3603170","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

手术视频总结可以帮助医疗专业人员快速了解手术过程，为外科教育和技能评估提供依据。然而，现有的方法无法有效地总结信息，以满足医疗专业人员。因为在平衡信息的丰富性和多样性的同时对视频进行总结是很有挑战性的。本文提出了一种动态手术视频总结框架（DSVS）。我们首先使用一个多任务学习网络来感知和理解手术动作的三个组成部分和阶段。然后，信息贡献模块使用预测的三元组测量帧级重要性。进一步采用相位识别和变化点检测两阶段策略，将手术视频的每个阶段划分为镜头。最后，建立了一个多目标0 - 1规划模型，通过最大化镜头内信息贡献和最小化镜头间信息相似度来选择最优的镜头子集。在两个手术视频数据集上的实验结果表明，该框架可以生成包含关键和不同内容的摘要。临床验证表明该框架能够总结外科医生期望的信息。源代码可以在https://github.com/syypretend/DSVS上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic surgery video summarization with balancing informativeness and diversity.

Surgery video summarization can help medical professionals quickly gain the insight into the surgical process for the surgical education and skill evaluation. However, existing methods are unable to efficiently summarize information to satisfy medical professionals. Since it is challenging to summarize the video while balancing the information richness and diversity. In this paper, we propose a dynamic surgery video summarization framework (DSVS). We first used a multitask learning network to perceive and comprehend surgical action triplet components and phases. An information contribution module then measures the frame-level importance using the predicted triplets. A two-stage strategy which involves phase recognition and change-point detection further applied to divide each phase of the surgical videos into shots. Finally, A multi-objective zero-one programming model is formulated to select the optimal subset of shots by simultaneously maximizing intra-shot information contribution and minimizing inter-shot information similarity. Experimental results on two surgical video datasets show the framework can generate summaries that encompass crucial and diverse content. Clinical validations indicate the framework is capable of summarizing the information expected by surgeons. The source code can be found at https://github.com/syypretend/DSVS.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Medical Imaging 医学-成像科学与照相技术

CiteScore

21.80

自引率

5.70%

发文量

637

审稿时长

5.6 months

期刊介绍： The IEEE Transactions on Medical Imaging (T-MI) is a journal that welcomes the submission of manuscripts focusing on various aspects of medical imaging. The journal encourages the exploration of body structure, morphology, and function through different imaging techniques, including ultrasound, X-rays, magnetic resonance, radionuclides, microwaves, and optical methods. It also promotes contributions related to cell and molecular imaging, as well as all forms of microscopy. T-MI publishes original research papers that cover a wide range of topics, including but not limited to novel acquisition techniques, medical image processing and analysis, visualization and performance, pattern recognition, machine learning, and other related methods. The journal particularly encourages highly technical studies that offer new perspectives. By emphasizing the unification of medicine, biology, and imaging, T-MI seeks to bridge the gap between instrumentation, hardware, software, mathematics, physics, biology, and medicine by introducing new analysis methods. While the journal welcomes strong application papers that describe novel methods, it directs papers that focus solely on important applications using medically adopted or well-established methods without significant innovation in methodology to other journals. T-MI is indexed in Pubmed® and Medline®, which are products of the United States National Library of Medicine.