{"title":"STANet: A Surgical Gesture Recognition Method Based on Spatiotemporal Fusion","authors":"Boqiang Jia, Wenjie Wang, Xin Tian, Xiaohua Wang","doi":"10.1111/nyas.70053","DOIUrl":null,"url":null,"abstract":"In robotic surgery, surgical gesture recognition has great importance in surgical quality evaluation and intelligent recognition assistance. Currently, deep learning models, such as recurrent neural networks and temporal convolutional networks, are mainly used to model action sequences and capture the temporal dependencies between them. However, some of these methods ignore the fusion of spatial and temporal features, and hence cannot effectively capture long‐term relationships and efficiently model action sequences. To overcome these limitations, we propose a spatiotemporal adaptive network (STANet) to fuse spatiotemporal features. Specifically, we designed a temporal module and a spatial module to extract respective features. Subsequently, these features were fused and further refined through temporal modeling using a temporal adaptive convolution strategy. This approach integrates both long‐term and short‐term characteristics of surgical gesture sequences. The organic combination of temporal and spatial modules was inserted into the backbone network to form the STANet, which efficiently modeled the action sequences. Our approach has been validated on the publicly available surgical gesture datasets JIGSAWS and RARP‐45, achieving very good results. Compared to other reported benchmark models, our model demonstrates exceptional performance. It can be used in surgical robots, visual feedback systems, and computer‐assisted surgery.","PeriodicalId":8250,"journal":{"name":"Annals of the New York Academy of Sciences","volume":"1 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the New York Academy of Sciences","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1111/nyas.70053","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
In robotic surgery, surgical gesture recognition has great importance in surgical quality evaluation and intelligent recognition assistance. Currently, deep learning models, such as recurrent neural networks and temporal convolutional networks, are mainly used to model action sequences and capture the temporal dependencies between them. However, some of these methods ignore the fusion of spatial and temporal features, and hence cannot effectively capture long‐term relationships and efficiently model action sequences. To overcome these limitations, we propose a spatiotemporal adaptive network (STANet) to fuse spatiotemporal features. Specifically, we designed a temporal module and a spatial module to extract respective features. Subsequently, these features were fused and further refined through temporal modeling using a temporal adaptive convolution strategy. This approach integrates both long‐term and short‐term characteristics of surgical gesture sequences. The organic combination of temporal and spatial modules was inserted into the backbone network to form the STANet, which efficiently modeled the action sequences. Our approach has been validated on the publicly available surgical gesture datasets JIGSAWS and RARP‐45, achieving very good results. Compared to other reported benchmark models, our model demonstrates exceptional performance. It can be used in surgical robots, visual feedback systems, and computer‐assisted surgery.
期刊介绍:
Published on behalf of the New York Academy of Sciences, Annals of the New York Academy of Sciences provides multidisciplinary perspectives on research of current scientific interest with far-reaching implications for the wider scientific community and society at large. Each special issue assembles the best thinking of key contributors to a field of investigation at a time when emerging developments offer the promise of new insight. Individually themed, Annals special issues stimulate new ways to think about science by providing a neutral forum for discourse—within and across many institutions and fields.