{"title":"以自我为中心的视频分析,通过深度学习自动评估开放手术技能。","authors":"Athanasios Gazis, Dimitrios Schizas, Stylianos Kykalos, Pantelis Karaiskos, Constantinos Loukas","doi":"10.1007/s11548-025-03518-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>While significant progress has been made in skill assessment for minimally invasive procedures, objective evaluation methods for open surgery remain limited. This paper presents a deep learning framework for assessing technical surgical skills using egocentric video data from open surgery training.</p><p><strong>Methods: </strong>Our dataset includes 201 videos and corresponding hand kinematics data from three fundamental training task-knot tying (KT), continuous suturing (CS), and interrupted suturing (IS)-performed by 20 participants. Each video was annotated by two experts using a modified OSATS scale (KT: five criteria, total score range: 5-25; CS/IS: seven criteria, total score range: 7-35). We evaluate three temporal architectures (LSTM, TCN, and Transformer), each using ResNet50 as the backbone for spatial feature extraction, and assess them under various training strategies: single-task learning, feature concatenation, pretraining, and multi-task learning with integrated kinematic data. Performance metrics included mean absolute error (MAE) and Spearman correlation coefficient ( <math><mi>ρ</mi></math> ), both with respect to total score prediction.</p><p><strong>Results: </strong>The Transformer-based models consistently outperformed LSTM and TCN across all tasks. The multi-task Transformer incorporating prediction of task completion time ( <math><msub><mtext>Transf-MT</mtext> <mtext>T+S</mtext></msub> </math> ) achieved the lowest MAE (KT: 1.92, CS: 2.81, and IS: 2.89) and <math><mi>ρ</mi></math> = 0.84- <math><mo>-</mo></math> 0.90. It also demonstrated promising capabilities for early skill assessment by predicting the total score from partial observations-particularly for simpler tasks. Additionally, we show that models trained on consensus expert ratings outperform those trained on individual annotations, highlighting the value of multi-rater ground truth.</p><p><strong>Conclusion: </strong>This research provides a foundation for objective, automated assessment of open surgical skills, with potential to improve the efficiency and standardization of surgical training.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Egocentric video analysis for automated assessment of open surgical skills via deep learning.\",\"authors\":\"Athanasios Gazis, Dimitrios Schizas, Stylianos Kykalos, Pantelis Karaiskos, Constantinos Loukas\",\"doi\":\"10.1007/s11548-025-03518-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>While significant progress has been made in skill assessment for minimally invasive procedures, objective evaluation methods for open surgery remain limited. This paper presents a deep learning framework for assessing technical surgical skills using egocentric video data from open surgery training.</p><p><strong>Methods: </strong>Our dataset includes 201 videos and corresponding hand kinematics data from three fundamental training task-knot tying (KT), continuous suturing (CS), and interrupted suturing (IS)-performed by 20 participants. Each video was annotated by two experts using a modified OSATS scale (KT: five criteria, total score range: 5-25; CS/IS: seven criteria, total score range: 7-35). We evaluate three temporal architectures (LSTM, TCN, and Transformer), each using ResNet50 as the backbone for spatial feature extraction, and assess them under various training strategies: single-task learning, feature concatenation, pretraining, and multi-task learning with integrated kinematic data. Performance metrics included mean absolute error (MAE) and Spearman correlation coefficient ( <math><mi>ρ</mi></math> ), both with respect to total score prediction.</p><p><strong>Results: </strong>The Transformer-based models consistently outperformed LSTM and TCN across all tasks. The multi-task Transformer incorporating prediction of task completion time ( <math><msub><mtext>Transf-MT</mtext> <mtext>T+S</mtext></msub> </math> ) achieved the lowest MAE (KT: 1.92, CS: 2.81, and IS: 2.89) and <math><mi>ρ</mi></math> = 0.84- <math><mo>-</mo></math> 0.90. It also demonstrated promising capabilities for early skill assessment by predicting the total score from partial observations-particularly for simpler tasks. Additionally, we show that models trained on consensus expert ratings outperform those trained on individual annotations, highlighting the value of multi-rater ground truth.</p><p><strong>Conclusion: </strong>This research provides a foundation for objective, automated assessment of open surgical skills, with potential to improve the efficiency and standardization of surgical training.</p>\",\"PeriodicalId\":51251,\"journal\":{\"name\":\"International Journal of Computer Assisted Radiology and Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Assisted Radiology and Surgery\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11548-025-03518-7\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03518-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Egocentric video analysis for automated assessment of open surgical skills via deep learning.
Purpose: While significant progress has been made in skill assessment for minimally invasive procedures, objective evaluation methods for open surgery remain limited. This paper presents a deep learning framework for assessing technical surgical skills using egocentric video data from open surgery training.
Methods: Our dataset includes 201 videos and corresponding hand kinematics data from three fundamental training task-knot tying (KT), continuous suturing (CS), and interrupted suturing (IS)-performed by 20 participants. Each video was annotated by two experts using a modified OSATS scale (KT: five criteria, total score range: 5-25; CS/IS: seven criteria, total score range: 7-35). We evaluate three temporal architectures (LSTM, TCN, and Transformer), each using ResNet50 as the backbone for spatial feature extraction, and assess them under various training strategies: single-task learning, feature concatenation, pretraining, and multi-task learning with integrated kinematic data. Performance metrics included mean absolute error (MAE) and Spearman correlation coefficient ( ), both with respect to total score prediction.
Results: The Transformer-based models consistently outperformed LSTM and TCN across all tasks. The multi-task Transformer incorporating prediction of task completion time ( ) achieved the lowest MAE (KT: 1.92, CS: 2.81, and IS: 2.89) and = 0.84- 0.90. It also demonstrated promising capabilities for early skill assessment by predicting the total score from partial observations-particularly for simpler tasks. Additionally, we show that models trained on consensus expert ratings outperform those trained on individual annotations, highlighting the value of multi-rater ground truth.
Conclusion: This research provides a foundation for objective, automated assessment of open surgical skills, with potential to improve the efficiency and standardization of surgical training.
期刊介绍:
The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.