{"title":"平衡信息量和多样性的动态手术视频总结。","authors":"Hao Wang,Yiyang Su,Min Xue,Xuefei Song,Cheng Yang,Lei Shi,Xianqun Fan,Shuai Ding","doi":"10.1109/tmi.2025.3603170","DOIUrl":null,"url":null,"abstract":"Surgery video summarization can help medical professionals quickly gain the insight into the surgical process for the surgical education and skill evaluation. However, existing methods are unable to efficiently summarize information to satisfy medical professionals. Since it is challenging to summarize the video while balancing the information richness and diversity. In this paper, we propose a dynamic surgery video summarization framework (DSVS). We first used a multitask learning network to perceive and comprehend surgical action triplet components and phases. An information contribution module then measures the frame-level importance using the predicted triplets. A two-stage strategy which involves phase recognition and change-point detection further applied to divide each phase of the surgical videos into shots. Finally, A multi-objective zero-one programming model is formulated to select the optimal subset of shots by simultaneously maximizing intra-shot information contribution and minimizing inter-shot information similarity. Experimental results on two surgical video datasets show the framework can generate summaries that encompass crucial and diverse content. Clinical validations indicate the framework is capable of summarizing the information expected by surgeons. The source code can be found at https://github.com/syypretend/DSVS.","PeriodicalId":13418,"journal":{"name":"IEEE Transactions on Medical Imaging","volume":"36 1","pages":""},"PeriodicalIF":9.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic surgery video summarization with balancing informativeness and diversity.\",\"authors\":\"Hao Wang,Yiyang Su,Min Xue,Xuefei Song,Cheng Yang,Lei Shi,Xianqun Fan,Shuai Ding\",\"doi\":\"10.1109/tmi.2025.3603170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Surgery video summarization can help medical professionals quickly gain the insight into the surgical process for the surgical education and skill evaluation. However, existing methods are unable to efficiently summarize information to satisfy medical professionals. Since it is challenging to summarize the video while balancing the information richness and diversity. In this paper, we propose a dynamic surgery video summarization framework (DSVS). We first used a multitask learning network to perceive and comprehend surgical action triplet components and phases. An information contribution module then measures the frame-level importance using the predicted triplets. A two-stage strategy which involves phase recognition and change-point detection further applied to divide each phase of the surgical videos into shots. Finally, A multi-objective zero-one programming model is formulated to select the optimal subset of shots by simultaneously maximizing intra-shot information contribution and minimizing inter-shot information similarity. Experimental results on two surgical video datasets show the framework can generate summaries that encompass crucial and diverse content. Clinical validations indicate the framework is capable of summarizing the information expected by surgeons. The source code can be found at https://github.com/syypretend/DSVS.\",\"PeriodicalId\":13418,\"journal\":{\"name\":\"IEEE Transactions on Medical Imaging\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":9.8000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Medical Imaging\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1109/tmi.2025.3603170\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Medical Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/tmi.2025.3603170","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Dynamic surgery video summarization with balancing informativeness and diversity.
Surgery video summarization can help medical professionals quickly gain the insight into the surgical process for the surgical education and skill evaluation. However, existing methods are unable to efficiently summarize information to satisfy medical professionals. Since it is challenging to summarize the video while balancing the information richness and diversity. In this paper, we propose a dynamic surgery video summarization framework (DSVS). We first used a multitask learning network to perceive and comprehend surgical action triplet components and phases. An information contribution module then measures the frame-level importance using the predicted triplets. A two-stage strategy which involves phase recognition and change-point detection further applied to divide each phase of the surgical videos into shots. Finally, A multi-objective zero-one programming model is formulated to select the optimal subset of shots by simultaneously maximizing intra-shot information contribution and minimizing inter-shot information similarity. Experimental results on two surgical video datasets show the framework can generate summaries that encompass crucial and diverse content. Clinical validations indicate the framework is capable of summarizing the information expected by surgeons. The source code can be found at https://github.com/syypretend/DSVS.
期刊介绍:
The IEEE Transactions on Medical Imaging (T-MI) is a journal that welcomes the submission of manuscripts focusing on various aspects of medical imaging. The journal encourages the exploration of body structure, morphology, and function through different imaging techniques, including ultrasound, X-rays, magnetic resonance, radionuclides, microwaves, and optical methods. It also promotes contributions related to cell and molecular imaging, as well as all forms of microscopy.
T-MI publishes original research papers that cover a wide range of topics, including but not limited to novel acquisition techniques, medical image processing and analysis, visualization and performance, pattern recognition, machine learning, and other related methods. The journal particularly encourages highly technical studies that offer new perspectives. By emphasizing the unification of medicine, biology, and imaging, T-MI seeks to bridge the gap between instrumentation, hardware, software, mathematics, physics, biology, and medicine by introducing new analysis methods.
While the journal welcomes strong application papers that describe novel methods, it directs papers that focus solely on important applications using medically adopted or well-established methods without significant innovation in methodology to other journals. T-MI is indexed in Pubmed® and Medline®, which are products of the United States National Library of Medicine.