Gui-Bin Bian, Yaqin Peng, Zhen Li, Qiang Ye, Ruichen Ma
{"title":"一种用于手术动作识别的时空动态融合网络","authors":"Gui-Bin Bian, Yaqin Peng, Zhen Li, Qiang Ye, Ruichen Ma","doi":"10.1016/j.neucom.2025.130808","DOIUrl":null,"url":null,"abstract":"<div><div>Surgical action recognition is crucial and tough in intelligent surgical robots, enabling these systems to accurately identify and interpret the ongoing actions within a surgical procedure. By recognizing the action state in real-time, the robot can provide immediate feedback and make adjustments to ensure the precision and safety of the surgery. However, it faces several challenges, such as the temporal complexity of surgical actions, the fine operation steps and subtle changes in surgical movements. Thus, a dynamic attention mechanism has been proposed to capture the temporal correlation between the current frame and the previous frames from video sequence. Furthermore, a spatiotemporal dynamic fusion network comprising two specialized modules has been proposed. The first module, Double Bi-level Routing Attention (DBRA), is designed to extract the most pertinent spatial and temporal features. While the other module, CNN-LSTM, is dedicated to delivering comprehensive spatiotemporal information. Experiments have been conducted on both a neurosurgical dataset Neuro67 and a public dataset Suturing to demonstrate the performance of the proposed method. The results indicate that the proposed method has achieved superior performance on the hard issue, achieving AP on Neuro67 and ACC on Suturing of 76.9% and 85.5%, leading by 6.7% and 1.2% respectively, with the modules effectively focusing on dependency relationships both within regions of a frame and across frames in video sequence.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"649 ","pages":"Article 130808"},"PeriodicalIF":6.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A spatiotemporal dynamic fusion network for surgical action recognition\",\"authors\":\"Gui-Bin Bian, Yaqin Peng, Zhen Li, Qiang Ye, Ruichen Ma\",\"doi\":\"10.1016/j.neucom.2025.130808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Surgical action recognition is crucial and tough in intelligent surgical robots, enabling these systems to accurately identify and interpret the ongoing actions within a surgical procedure. By recognizing the action state in real-time, the robot can provide immediate feedback and make adjustments to ensure the precision and safety of the surgery. However, it faces several challenges, such as the temporal complexity of surgical actions, the fine operation steps and subtle changes in surgical movements. Thus, a dynamic attention mechanism has been proposed to capture the temporal correlation between the current frame and the previous frames from video sequence. Furthermore, a spatiotemporal dynamic fusion network comprising two specialized modules has been proposed. The first module, Double Bi-level Routing Attention (DBRA), is designed to extract the most pertinent spatial and temporal features. While the other module, CNN-LSTM, is dedicated to delivering comprehensive spatiotemporal information. Experiments have been conducted on both a neurosurgical dataset Neuro67 and a public dataset Suturing to demonstrate the performance of the proposed method. The results indicate that the proposed method has achieved superior performance on the hard issue, achieving AP on Neuro67 and ACC on Suturing of 76.9% and 85.5%, leading by 6.7% and 1.2% respectively, with the modules effectively focusing on dependency relationships both within regions of a frame and across frames in video sequence.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"649 \",\"pages\":\"Article 130808\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225014808\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014808","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A spatiotemporal dynamic fusion network for surgical action recognition
Surgical action recognition is crucial and tough in intelligent surgical robots, enabling these systems to accurately identify and interpret the ongoing actions within a surgical procedure. By recognizing the action state in real-time, the robot can provide immediate feedback and make adjustments to ensure the precision and safety of the surgery. However, it faces several challenges, such as the temporal complexity of surgical actions, the fine operation steps and subtle changes in surgical movements. Thus, a dynamic attention mechanism has been proposed to capture the temporal correlation between the current frame and the previous frames from video sequence. Furthermore, a spatiotemporal dynamic fusion network comprising two specialized modules has been proposed. The first module, Double Bi-level Routing Attention (DBRA), is designed to extract the most pertinent spatial and temporal features. While the other module, CNN-LSTM, is dedicated to delivering comprehensive spatiotemporal information. Experiments have been conducted on both a neurosurgical dataset Neuro67 and a public dataset Suturing to demonstrate the performance of the proposed method. The results indicate that the proposed method has achieved superior performance on the hard issue, achieving AP on Neuro67 and ACC on Suturing of 76.9% and 85.5%, leading by 6.7% and 1.2% respectively, with the modules effectively focusing on dependency relationships both within regions of a frame and across frames in video sequence.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.