基于时空融合的外科手势识别方法

IF 4.8 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Boqiang Jia, Wenjie Wang, Xin Tian, Xiaohua Wang
{"title":"基于时空融合的外科手势识别方法","authors":"Boqiang Jia, Wenjie Wang, Xin Tian, Xiaohua Wang","doi":"10.1111/nyas.70053","DOIUrl":null,"url":null,"abstract":"In robotic surgery, surgical gesture recognition has great importance in surgical quality evaluation and intelligent recognition assistance. Currently, deep learning models, such as recurrent neural networks and temporal convolutional networks, are mainly used to model action sequences and capture the temporal dependencies between them. However, some of these methods ignore the fusion of spatial and temporal features, and hence cannot effectively capture long‐term relationships and efficiently model action sequences. To overcome these limitations, we propose a spatiotemporal adaptive network (STANet) to fuse spatiotemporal features. Specifically, we designed a temporal module and a spatial module to extract respective features. Subsequently, these features were fused and further refined through temporal modeling using a temporal adaptive convolution strategy. This approach integrates both long‐term and short‐term characteristics of surgical gesture sequences. The organic combination of temporal and spatial modules was inserted into the backbone network to form the STANet, which efficiently modeled the action sequences. Our approach has been validated on the publicly available surgical gesture datasets JIGSAWS and RARP‐45, achieving very good results. Compared to other reported benchmark models, our model demonstrates exceptional performance. It can be used in surgical robots, visual feedback systems, and computer‐assisted surgery.","PeriodicalId":8250,"journal":{"name":"Annals of the New York Academy of Sciences","volume":"1 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STANet: A Surgical Gesture Recognition Method Based on Spatiotemporal Fusion\",\"authors\":\"Boqiang Jia, Wenjie Wang, Xin Tian, Xiaohua Wang\",\"doi\":\"10.1111/nyas.70053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In robotic surgery, surgical gesture recognition has great importance in surgical quality evaluation and intelligent recognition assistance. Currently, deep learning models, such as recurrent neural networks and temporal convolutional networks, are mainly used to model action sequences and capture the temporal dependencies between them. However, some of these methods ignore the fusion of spatial and temporal features, and hence cannot effectively capture long‐term relationships and efficiently model action sequences. To overcome these limitations, we propose a spatiotemporal adaptive network (STANet) to fuse spatiotemporal features. Specifically, we designed a temporal module and a spatial module to extract respective features. Subsequently, these features were fused and further refined through temporal modeling using a temporal adaptive convolution strategy. This approach integrates both long‐term and short‐term characteristics of surgical gesture sequences. The organic combination of temporal and spatial modules was inserted into the backbone network to form the STANet, which efficiently modeled the action sequences. Our approach has been validated on the publicly available surgical gesture datasets JIGSAWS and RARP‐45, achieving very good results. Compared to other reported benchmark models, our model demonstrates exceptional performance. It can be used in surgical robots, visual feedback systems, and computer‐assisted surgery.\",\"PeriodicalId\":8250,\"journal\":{\"name\":\"Annals of the New York Academy of Sciences\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of the New York Academy of Sciences\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1111/nyas.70053\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of the New York Academy of Sciences","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1111/nyas.70053","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

在机器人手术中,手术手势识别在手术质量评价和智能识别辅助中具有重要意义。目前,深度学习模型,如递归神经网络和时间卷积网络,主要用于动作序列建模和捕获它们之间的时间依赖性。然而,其中一些方法忽略了空间和时间特征的融合,因此不能有效地捕获长期关系并有效地模拟动作序列。为了克服这些限制,我们提出了一个时空自适应网络(STANet)来融合时空特征。具体来说,我们设计了一个时间模块和一个空间模块来提取各自的特征。随后,通过使用时间自适应卷积策略进行时间建模,将这些特征融合并进一步细化。这种方法整合了手术手势序列的长期和短期特征。在骨干网中插入时间和空间模块的有机结合,形成STANet,有效地对动作序列进行建模。我们的方法已经在公开可用的手术手势数据集JIGSAWS和RARP‐45上进行了验证,取得了非常好的效果。与其他已报道的基准模型相比,我们的模型表现出卓越的性能。它可以用于手术机器人、视觉反馈系统和计算机辅助手术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
STANet: A Surgical Gesture Recognition Method Based on Spatiotemporal Fusion
In robotic surgery, surgical gesture recognition has great importance in surgical quality evaluation and intelligent recognition assistance. Currently, deep learning models, such as recurrent neural networks and temporal convolutional networks, are mainly used to model action sequences and capture the temporal dependencies between them. However, some of these methods ignore the fusion of spatial and temporal features, and hence cannot effectively capture long‐term relationships and efficiently model action sequences. To overcome these limitations, we propose a spatiotemporal adaptive network (STANet) to fuse spatiotemporal features. Specifically, we designed a temporal module and a spatial module to extract respective features. Subsequently, these features were fused and further refined through temporal modeling using a temporal adaptive convolution strategy. This approach integrates both long‐term and short‐term characteristics of surgical gesture sequences. The organic combination of temporal and spatial modules was inserted into the backbone network to form the STANet, which efficiently modeled the action sequences. Our approach has been validated on the publicly available surgical gesture datasets JIGSAWS and RARP‐45, achieving very good results. Compared to other reported benchmark models, our model demonstrates exceptional performance. It can be used in surgical robots, visual feedback systems, and computer‐assisted surgery.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Annals of the New York Academy of Sciences
Annals of the New York Academy of Sciences 综合性期刊-综合性期刊
CiteScore
11.00
自引率
1.90%
发文量
193
审稿时长
2-4 weeks
期刊介绍: Published on behalf of the New York Academy of Sciences, Annals of the New York Academy of Sciences provides multidisciplinary perspectives on research of current scientific interest with far-reaching implications for the wider scientific community and society at large. Each special issue assembles the best thinking of key contributors to a field of investigation at a time when emerging developments offer the promise of new insight. Individually themed, Annals special issues stimulate new ways to think about science by providing a neutral forum for discourse—within and across many institutions and fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信