Shujun Ju, Penglin Jiang, Yutong Jin, Yaoyu Fu, Xiandi Wang, Xiaomei Tan, Ying Han, Rong Yin, Dan Pu, Kang Li
{"title":"Automatic gesture recognition and evaluation in peg transfer tasks of laparoscopic surgery training.","authors":"Shujun Ju, Penglin Jiang, Yutong Jin, Yaoyu Fu, Xiandi Wang, Xiaomei Tan, Ying Han, Rong Yin, Dan Pu, Kang Li","doi":"10.1007/s00464-025-11730-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Laparoscopic surgery training is gaining increasing importance. To release doctors from the burden of manually annotating videos, we proposed an automatic surgical gesture recognition model based on the Fundamentals of Laparoscopic Surgery (FLS) and the Chinese Laparoscopic Skills Testing and Assessment (CLSTA) tools. Furthermore, statistical analysis was conducted based on a gesture vocabulary that had been designed to examine differences between groups at different levels.</p><p><strong>Methods: </strong>Based on the CLSTA, the training process of peg transfer can be represented by a standard sequence of seven surgical gestures defined in our gesture vocabulary. The dataset used for model training and testing included eighty videos recorded at 30 fps. All videos were rated by senior medical professionals from our medical training center. The dataset was processed using cross-validation to ensure robust model performance. The model applied is 3D ResNet-18, a convolutional neural network (CNN). An LSTM neural network was utilized to refine the output sequence.</p><p><strong>Results: </strong>The overall accuracy for the recognition model was 83.8% and the F1 score was 84%. The LSTM network improved model performance to 85.84% accuracy and an 85% F1 score. Every operative process starts with Gesture 1 (G1) and ends with G5, with wrong placement is labeled as G6. The average training time is 92 s (SD = 36). Variance was observed between groups for G1, G3, and G6, indicating that trainees may benefit from focusing their efforts on these relevant operations, while assisting doctors also in more effectively analyzing the training outcome.</p><p><strong>Conclusion: </strong>An automatic surgical gesture recognition model was developed for the peg transfer task. We also defined a gesture vocabulary along with the artificial intelligence model to sequentially describe the training operation. This provides an opportunity for artificial intelligence-enabled objective and automatic evaluation based on CLSTA in the clinic implementation.</p>","PeriodicalId":22174,"journal":{"name":"Surgical Endoscopy And Other Interventional Techniques","volume":" ","pages":"3749-3759"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical Endoscopy And Other Interventional Techniques","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00464-025-11730-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Laparoscopic surgery training is gaining increasing importance. To release doctors from the burden of manually annotating videos, we proposed an automatic surgical gesture recognition model based on the Fundamentals of Laparoscopic Surgery (FLS) and the Chinese Laparoscopic Skills Testing and Assessment (CLSTA) tools. Furthermore, statistical analysis was conducted based on a gesture vocabulary that had been designed to examine differences between groups at different levels.
Methods: Based on the CLSTA, the training process of peg transfer can be represented by a standard sequence of seven surgical gestures defined in our gesture vocabulary. The dataset used for model training and testing included eighty videos recorded at 30 fps. All videos were rated by senior medical professionals from our medical training center. The dataset was processed using cross-validation to ensure robust model performance. The model applied is 3D ResNet-18, a convolutional neural network (CNN). An LSTM neural network was utilized to refine the output sequence.
Results: The overall accuracy for the recognition model was 83.8% and the F1 score was 84%. The LSTM network improved model performance to 85.84% accuracy and an 85% F1 score. Every operative process starts with Gesture 1 (G1) and ends with G5, with wrong placement is labeled as G6. The average training time is 92 s (SD = 36). Variance was observed between groups for G1, G3, and G6, indicating that trainees may benefit from focusing their efforts on these relevant operations, while assisting doctors also in more effectively analyzing the training outcome.
Conclusion: An automatic surgical gesture recognition model was developed for the peg transfer task. We also defined a gesture vocabulary along with the artificial intelligence model to sequentially describe the training operation. This provides an opportunity for artificial intelligence-enabled objective and automatic evaluation based on CLSTA in the clinic implementation.
期刊介绍:
Uniquely positioned at the interface between various medical and surgical disciplines, Surgical Endoscopy serves as a focal point for the international surgical community to exchange information on practice, theory, and research.
Topics covered in the journal include:
-Surgical aspects of:
Interventional endoscopy,
Ultrasound,
Other techniques in the fields of gastroenterology, obstetrics, gynecology, and urology,
-Gastroenterologic surgery
-Thoracic surgery
-Traumatic surgery
-Orthopedic surgery
-Pediatric surgery