Shujun Ju, Penglin Jiang, Yutong Jin, Yaoyu Fu, Xiandi Wang, Xiaomei Tan, Ying Han, Rong Yin, Dan Pu, Kang Li
{"title":"自动手势识别和评估在腹腔镜手术训练中的peg转移任务。","authors":"Shujun Ju, Penglin Jiang, Yutong Jin, Yaoyu Fu, Xiandi Wang, Xiaomei Tan, Ying Han, Rong Yin, Dan Pu, Kang Li","doi":"10.1007/s00464-025-11730-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Laparoscopic surgery training is gaining increasing importance. To release doctors from the burden of manually annotating videos, we proposed an automatic surgical gesture recognition model based on the Fundamentals of Laparoscopic Surgery (FLS) and the Chinese Laparoscopic Skills Testing and Assessment (CLSTA) tools. Furthermore, statistical analysis was conducted based on a gesture vocabulary that had been designed to examine differences between groups at different levels.</p><p><strong>Methods: </strong>Based on the CLSTA, the training process of peg transfer can be represented by a standard sequence of seven surgical gestures defined in our gesture vocabulary. The dataset used for model training and testing included eighty videos recorded at 30 fps. All videos were rated by senior medical professionals from our medical training center. The dataset was processed using cross-validation to ensure robust model performance. The model applied is 3D ResNet-18, a convolutional neural network (CNN). An LSTM neural network was utilized to refine the output sequence.</p><p><strong>Results: </strong>The overall accuracy for the recognition model was 83.8% and the F1 score was 84%. The LSTM network improved model performance to 85.84% accuracy and an 85% F1 score. Every operative process starts with Gesture 1 (G1) and ends with G5, with wrong placement is labeled as G6. The average training time is 92 s (SD = 36). Variance was observed between groups for G1, G3, and G6, indicating that trainees may benefit from focusing their efforts on these relevant operations, while assisting doctors also in more effectively analyzing the training outcome.</p><p><strong>Conclusion: </strong>An automatic surgical gesture recognition model was developed for the peg transfer task. We also defined a gesture vocabulary along with the artificial intelligence model to sequentially describe the training operation. This provides an opportunity for artificial intelligence-enabled objective and automatic evaluation based on CLSTA in the clinic implementation.</p>","PeriodicalId":22174,"journal":{"name":"Surgical Endoscopy And Other Interventional Techniques","volume":" ","pages":"3749-3759"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic gesture recognition and evaluation in peg transfer tasks of laparoscopic surgery training.\",\"authors\":\"Shujun Ju, Penglin Jiang, Yutong Jin, Yaoyu Fu, Xiandi Wang, Xiaomei Tan, Ying Han, Rong Yin, Dan Pu, Kang Li\",\"doi\":\"10.1007/s00464-025-11730-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Laparoscopic surgery training is gaining increasing importance. To release doctors from the burden of manually annotating videos, we proposed an automatic surgical gesture recognition model based on the Fundamentals of Laparoscopic Surgery (FLS) and the Chinese Laparoscopic Skills Testing and Assessment (CLSTA) tools. Furthermore, statistical analysis was conducted based on a gesture vocabulary that had been designed to examine differences between groups at different levels.</p><p><strong>Methods: </strong>Based on the CLSTA, the training process of peg transfer can be represented by a standard sequence of seven surgical gestures defined in our gesture vocabulary. The dataset used for model training and testing included eighty videos recorded at 30 fps. All videos were rated by senior medical professionals from our medical training center. The dataset was processed using cross-validation to ensure robust model performance. The model applied is 3D ResNet-18, a convolutional neural network (CNN). An LSTM neural network was utilized to refine the output sequence.</p><p><strong>Results: </strong>The overall accuracy for the recognition model was 83.8% and the F1 score was 84%. The LSTM network improved model performance to 85.84% accuracy and an 85% F1 score. Every operative process starts with Gesture 1 (G1) and ends with G5, with wrong placement is labeled as G6. The average training time is 92 s (SD = 36). Variance was observed between groups for G1, G3, and G6, indicating that trainees may benefit from focusing their efforts on these relevant operations, while assisting doctors also in more effectively analyzing the training outcome.</p><p><strong>Conclusion: </strong>An automatic surgical gesture recognition model was developed for the peg transfer task. We also defined a gesture vocabulary along with the artificial intelligence model to sequentially describe the training operation. This provides an opportunity for artificial intelligence-enabled objective and automatic evaluation based on CLSTA in the clinic implementation.</p>\",\"PeriodicalId\":22174,\"journal\":{\"name\":\"Surgical Endoscopy And Other Interventional Techniques\",\"volume\":\" \",\"pages\":\"3749-3759\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Surgical Endoscopy And Other Interventional Techniques\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00464-025-11730-4\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical Endoscopy And Other Interventional Techniques","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00464-025-11730-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
摘要
背景:腹腔镜手术训练越来越重要。为了减轻医生手动注释视频的负担,我们提出了一种基于腹腔镜手术基础知识(FLS)和中国腹腔镜技能测试与评估(CLSTA)工具的手术手势自动识别模型。此外,统计分析是基于手势词汇进行的,该词汇旨在检查不同水平的群体之间的差异。方法:在CLSTA的基础上,用我们的手势词汇表中定义的七个手术手势的标准序列来表示peg转移的训练过程。用于模型训练和测试的数据集包括80个以30 fps录制的视频。所有视频均由我们医疗培训中心的资深医疗专业人员评分。使用交叉验证对数据集进行处理,以确保模型的鲁棒性。应用的模型是3D ResNet-18,一个卷积神经网络(CNN)。利用LSTM神经网络对输出序列进行细化。结果:该识别模型的总体准确率为83.8%,F1得分为84%。LSTM网络将模型性能提高到85.84%的准确率和85%的F1分数。每个操作过程从手势1 (G1)开始,以手势5结束,错误放置标记为手势6。平均训练时间为92 s (SD = 36)。G1、G3和G6组间存在差异,说明学员将精力集中在这些相关操作上可能会受益,同时也有助于医生更有效地分析培训结果。结论:建立了一种手术手势自动识别模型。我们还定义了一个手势词汇表以及人工智能模型,以顺序描述训练操作。这为临床实施中基于CLSTA的人工智能支持的客观和自动评估提供了机会。
Automatic gesture recognition and evaluation in peg transfer tasks of laparoscopic surgery training.
Background: Laparoscopic surgery training is gaining increasing importance. To release doctors from the burden of manually annotating videos, we proposed an automatic surgical gesture recognition model based on the Fundamentals of Laparoscopic Surgery (FLS) and the Chinese Laparoscopic Skills Testing and Assessment (CLSTA) tools. Furthermore, statistical analysis was conducted based on a gesture vocabulary that had been designed to examine differences between groups at different levels.
Methods: Based on the CLSTA, the training process of peg transfer can be represented by a standard sequence of seven surgical gestures defined in our gesture vocabulary. The dataset used for model training and testing included eighty videos recorded at 30 fps. All videos were rated by senior medical professionals from our medical training center. The dataset was processed using cross-validation to ensure robust model performance. The model applied is 3D ResNet-18, a convolutional neural network (CNN). An LSTM neural network was utilized to refine the output sequence.
Results: The overall accuracy for the recognition model was 83.8% and the F1 score was 84%. The LSTM network improved model performance to 85.84% accuracy and an 85% F1 score. Every operative process starts with Gesture 1 (G1) and ends with G5, with wrong placement is labeled as G6. The average training time is 92 s (SD = 36). Variance was observed between groups for G1, G3, and G6, indicating that trainees may benefit from focusing their efforts on these relevant operations, while assisting doctors also in more effectively analyzing the training outcome.
Conclusion: An automatic surgical gesture recognition model was developed for the peg transfer task. We also defined a gesture vocabulary along with the artificial intelligence model to sequentially describe the training operation. This provides an opportunity for artificial intelligence-enabled objective and automatic evaluation based on CLSTA in the clinic implementation.
期刊介绍:
Uniquely positioned at the interface between various medical and surgical disciplines, Surgical Endoscopy serves as a focal point for the international surgical community to exchange information on practice, theory, and research.
Topics covered in the journal include:
-Surgical aspects of:
Interventional endoscopy,
Ultrasound,
Other techniques in the fields of gastroenterology, obstetrics, gynecology, and urology,
-Gastroenterologic surgery
-Thoracic surgery
-Traumatic surgery
-Orthopedic surgery
-Pediatric surgery