以自我为中心的视频分析，通过深度学习自动评估开放手术技能。

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-09-18 DOI:10.1007/s11548-025-03518-7

Athanasios Gazis, Dimitrios Schizas, Stylianos Kykalos, Pantelis Karaiskos, Constantinos Loukas

{"title":"以自我为中心的视频分析，通过深度学习自动评估开放手术技能。","authors":"Athanasios Gazis, Dimitrios Schizas, Stylianos Kykalos, Pantelis Karaiskos, Constantinos Loukas","doi":"10.1007/s11548-025-03518-7","DOIUrl":null,"url":null,"abstract":"Purpose: While significant progress has been made in skill assessment for minimally invasive procedures, objective evaluation methods for open surgery remain limited. This paper presents a deep learning framework for assessing technical surgical skills using egocentric video data from open surgery training.Methods: Our dataset includes 201 videos and corresponding hand kinematics data from three fundamental training task-knot tying (KT), continuous suturing (CS), and interrupted suturing (IS)-performed by 20 participants. Each video was annotated by two experts using a modified OSATS scale (KT: five criteria, total score range: 5-25; CS/IS: seven criteria, total score range: 7-35). We evaluate three temporal architectures (LSTM, TCN, and Transformer), each using ResNet50 as the backbone for spatial feature extraction, and assess them under various training strategies: single-task learning, feature concatenation, pretraining, and multi-task learning with integrated kinematic data. Performance metrics included mean absolute error (MAE) and Spearman correlation coefficient ( <math><mi>ρ</mi></math> ), both with respect to total score prediction.Results: The Transformer-based models consistently outperformed LSTM and TCN across all tasks. The multi-task Transformer incorporating prediction of task completion time ( <math><msub><mtext>Transf-MT</mtext> <mtext>T+S</mtext></msub> </math> ) achieved the lowest MAE (KT: 1.92, CS: 2.81, and IS: 2.89) and <math><mi>ρ</mi></math> = 0.84- <math><mo>-</mo></math> 0.90. It also demonstrated promising capabilities for early skill assessment by predicting the total score from partial observations-particularly for simpler tasks. Additionally, we show that models trained on consensus expert ratings outperform those trained on individual annotations, highlighting the value of multi-rater ground truth.Conclusion: This research provides a foundation for objective, automated assessment of open surgical skills, with potential to improve the efficiency and standardization of surgical training.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Egocentric video analysis for automated assessment of open surgical skills via deep learning.\",\"authors\":\"Athanasios Gazis, Dimitrios Schizas, Stylianos Kykalos, Pantelis Karaiskos, Constantinos Loukas\",\"doi\":\"10.1007/s11548-025-03518-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: While significant progress has been made in skill assessment for minimally invasive procedures, objective evaluation methods for open surgery remain limited. This paper presents a deep learning framework for assessing technical surgical skills using egocentric video data from open surgery training.Methods: Our dataset includes 201 videos and corresponding hand kinematics data from three fundamental training task-knot tying (KT), continuous suturing (CS), and interrupted suturing (IS)-performed by 20 participants. Each video was annotated by two experts using a modified OSATS scale (KT: five criteria, total score range: 5-25; CS/IS: seven criteria, total score range: 7-35). We evaluate three temporal architectures (LSTM, TCN, and Transformer), each using ResNet50 as the backbone for spatial feature extraction, and assess them under various training strategies: single-task learning, feature concatenation, pretraining, and multi-task learning with integrated kinematic data. Performance metrics included mean absolute error (MAE) and Spearman correlation coefficient ( <math><mi>ρ</mi></math> ), both with respect to total score prediction.Results: The Transformer-based models consistently outperformed LSTM and TCN across all tasks. The multi-task Transformer incorporating prediction of task completion time ( <math><msub><mtext>Transf-MT</mtext> <mtext>T+S</mtext></msub> </math> ) achieved the lowest MAE (KT: 1.92, CS: 2.81, and IS: 2.89) and <math><mi>ρ</mi></math> = 0.84- <math><mo>-</mo></math> 0.90. It also demonstrated promising capabilities for early skill assessment by predicting the total score from partial observations-particularly for simpler tasks. Additionally, we show that models trained on consensus expert ratings outperform those trained on individual annotations, highlighting the value of multi-rater ground truth.Conclusion: This research provides a foundation for objective, automated assessment of open surgical skills, with potential to improve the efficiency and standardization of surgical training.\",\"PeriodicalId\":51251,\"journal\":{\"name\":\"International Journal of Computer Assisted Radiology and Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Assisted Radiology and Surgery\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11548-025-03518-7\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03518-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

目的：虽然微创手术的技能评估取得了重大进展，但开放手术的客观评估方法仍然有限。本文提出了一个深度学习框架，用于使用来自开放手术训练的以自我为中心的视频数据来评估外科技术技能。方法：我们的数据集包括201个视频和相应的手部运动学数据，这些数据来自20名参与者进行的三种基本训练任务——打结（KT）、连续缝合（CS）和中断缝合（IS）。每个视频由两位专家使用改进的OSATS量表（KT: 5个标准，总分范围：5-25；CS/IS: 7个标准，总分范围：7-35）进行注释。我们评估了三种时态架构（LSTM、TCN和Transformer），它们都使用ResNet50作为空间特征提取的主干，并在不同的训练策略下对它们进行了评估：单任务学习、特征拼接、预训练和多任务学习。性能指标包括平均绝对误差（MAE）和Spearman相关系数（ρ），两者都与总分预测有关。结果：基于transformer的模型在所有任务中始终优于LSTM和TCN。结合任务完成时间预测的多任务Transformer （Transf-MT T+S）获得了最低的MAE (KT: 1.92, CS: 2.81, IS: 2.89), ρ = 0.84- 0.90。它还展示了早期技能评估的潜力，通过部分观察预测总分，特别是对于简单的任务。此外，我们表明，在共识专家评级上训练的模型优于那些在个人注释上训练的模型，突出了多评级基础真值的价值。结论：本研究为开放性手术技能的客观、自动化评估奠定了基础，具有提高开放性手术培训效率和规范化的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Egocentric video analysis for automated assessment of open surgical skills via deep learning.

Purpose: While significant progress has been made in skill assessment for minimally invasive procedures, objective evaluation methods for open surgery remain limited. This paper presents a deep learning framework for assessing technical surgical skills using egocentric video data from open surgery training.

Methods: Our dataset includes 201 videos and corresponding hand kinematics data from three fundamental training task-knot tying (KT), continuous suturing (CS), and interrupted suturing (IS)-performed by 20 participants. Each video was annotated by two experts using a modified OSATS scale (KT: five criteria, total score range: 5-25; CS/IS: seven criteria, total score range: 7-35). We evaluate three temporal architectures (LSTM, TCN, and Transformer), each using ResNet50 as the backbone for spatial feature extraction, and assess them under various training strategies: single-task learning, feature concatenation, pretraining, and multi-task learning with integrated kinematic data. Performance metrics included mean absolute error (MAE) and Spearman correlation coefficient ( $ρ$ ), both with respect to total score prediction.

Results: The Transformer-based models consistently outperformed LSTM and TCN across all tasks. The multi-task Transformer incorporating prediction of task completion time ( ${Transf-MT}_{T+S}$ ) achieved the lowest MAE (KT: 1.92, CS: 2.81, and IS: 2.89) and $ρ$ = 0.84- $-$ 0.90. It also demonstrated promising capabilities for early skill assessment by predicting the total score from partial observations-particularly for simpler tasks. Additionally, we show that models trained on consensus expert ratings outperform those trained on individual annotations, highlighting the value of multi-rater ground truth.

Conclusion: This research provides a foundation for objective, automated assessment of open surgical skills, with potential to improve the efficiency and standardization of surgical training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.