基于自我中心视频的脑卒中后手部损伤多视频分类模型。

IF 5.2 2区 医学 Q2 ENGINEERING, BIOMEDICAL
Anne Mei;Meng-Fen Tsai;José Zariffa
{"title":"基于自我中心视频的脑卒中后手部损伤多视频分类模型。","authors":"Anne Mei;Meng-Fen Tsai;José Zariffa","doi":"10.1109/TNSRE.2025.3596488","DOIUrl":null,"url":null,"abstract":"Objectives: After stroke, hand function assessments are used as outcome measures to evaluate new rehabilitation therapies, but do not reflect true performance in natural environments. Wearable (egocentric) cameras provide a way to capture hand function information during activities of daily living (ADLs). However, while clinical assessments involve observing multiple functional tasks, existing deep learning methods developed to analyze hands in egocentric video are only capable of considering single ADLs. This study presents a novel multi-video architecture that processes multiple task videos to make improved estimations about hand impairment. Methods: An egocentric video dataset of ADLs performed by stroke survivors in a home simulation lab was used to develop single and multi-input video models for binary impairment classification. Using SlowFast as a base feature extractor, late fusion (majority voting, fully-connected network) and intermediate fusion (concatenation, Markov chain) were investigated for building multi-video architectures. Results: Through evaluation with Leave-One-Participant-Out-Cross-Validation, using intermediate concatenation fusion to build multi-video models was found to achieve the best performance out of the fusion techniques. The resulting multi-video model for cropped inputs achieved an F1-score of <inline-formula> <tex-math>$0.778\\pm 0.129$ </tex-math></inline-formula> and significantly outperformed its single-video counterpart (F1-score of <inline-formula> <tex-math>$0.696\\pm 0.102$ </tex-math></inline-formula>). Similarly, the multi-video model for full-frame inputs (F1-score of <inline-formula> <tex-math>$0.796\\pm 0.102$ </tex-math></inline-formula>) significantly outperformed its single-video counterpart (F1-score of <inline-formula> <tex-math>$0.708\\pm 0.099$ </tex-math></inline-formula>). Conclusion: Multi-video architectures are beneficial for estimating hand impairment from egocentric video after stroke. Significance: The proposed deep learning solution is the first of its kind in multi-video analysis, and opens the door to further applications in automating other multi-observation assessments for clinical use.","PeriodicalId":13419,"journal":{"name":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","volume":"33 ","pages":"3303-3313"},"PeriodicalIF":5.2000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11115139","citationCount":"0","resultStr":"{\"title\":\"Multivideo Models for Classifying Hand Impairment After Stroke Using Egocentric Video\",\"authors\":\"Anne Mei;Meng-Fen Tsai;José Zariffa\",\"doi\":\"10.1109/TNSRE.2025.3596488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: After stroke, hand function assessments are used as outcome measures to evaluate new rehabilitation therapies, but do not reflect true performance in natural environments. Wearable (egocentric) cameras provide a way to capture hand function information during activities of daily living (ADLs). However, while clinical assessments involve observing multiple functional tasks, existing deep learning methods developed to analyze hands in egocentric video are only capable of considering single ADLs. This study presents a novel multi-video architecture that processes multiple task videos to make improved estimations about hand impairment. Methods: An egocentric video dataset of ADLs performed by stroke survivors in a home simulation lab was used to develop single and multi-input video models for binary impairment classification. Using SlowFast as a base feature extractor, late fusion (majority voting, fully-connected network) and intermediate fusion (concatenation, Markov chain) were investigated for building multi-video architectures. Results: Through evaluation with Leave-One-Participant-Out-Cross-Validation, using intermediate concatenation fusion to build multi-video models was found to achieve the best performance out of the fusion techniques. The resulting multi-video model for cropped inputs achieved an F1-score of <inline-formula> <tex-math>$0.778\\\\pm 0.129$ </tex-math></inline-formula> and significantly outperformed its single-video counterpart (F1-score of <inline-formula> <tex-math>$0.696\\\\pm 0.102$ </tex-math></inline-formula>). Similarly, the multi-video model for full-frame inputs (F1-score of <inline-formula> <tex-math>$0.796\\\\pm 0.102$ </tex-math></inline-formula>) significantly outperformed its single-video counterpart (F1-score of <inline-formula> <tex-math>$0.708\\\\pm 0.099$ </tex-math></inline-formula>). Conclusion: Multi-video architectures are beneficial for estimating hand impairment from egocentric video after stroke. Significance: The proposed deep learning solution is the first of its kind in multi-video analysis, and opens the door to further applications in automating other multi-observation assessments for clinical use.\",\"PeriodicalId\":13419,\"journal\":{\"name\":\"IEEE Transactions on Neural Systems and Rehabilitation Engineering\",\"volume\":\"33 \",\"pages\":\"3303-3313\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11115139\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Neural Systems and Rehabilitation Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11115139/\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Neural Systems and Rehabilitation Engineering","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11115139/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

摘要

目的:卒中后,手功能评估被用作评估新的康复疗法的结果指标,但不能反映在自然环境中的真实表现。可穿戴(自我中心)相机提供了一种在日常生活活动(ADLs)中捕捉手功能信息的方法。然而,虽然临床评估涉及观察多种功能任务,但现有的用于分析以自我为中心的视频中的手的深度学习方法只能考虑单个adl。本研究提出了一种新的多视频架构,该架构可以处理多任务视频,以改进手部损伤的估计。方法:利用家庭模拟实验室中脑卒中幸存者进行adl的自我中心视频数据集,开发单输入和多输入视频模型,用于二值损伤分类。以SlowFast为基本特征提取器,研究了后期融合(多数投票、全连接网络)和中间融合(拼接、马尔可夫链)构建多视频架构的方法。结果:通过leave - 1 - participant - out交叉验证的评估,发现采用中间拼接融合构建多视频模型是融合技术中性能最好的。所得到的裁剪输入的多视频模型的f1得分为0.778±0.129,显著优于单视频模型的f1得分(0.696±0.102)。同样,全帧输入的多视频模型(f1得分为0.796±0.102)明显优于单视频模型(f1得分为0.708±0.099)。结论:多视频架构有助于脑卒中后以自我为中心的视频评估手部损伤。意义:提出的深度学习解决方案是多视频分析领域的首个此类解决方案,并为进一步应用于临床使用的自动化其他多观察评估打开了大门。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multivideo Models for Classifying Hand Impairment After Stroke Using Egocentric Video
Objectives: After stroke, hand function assessments are used as outcome measures to evaluate new rehabilitation therapies, but do not reflect true performance in natural environments. Wearable (egocentric) cameras provide a way to capture hand function information during activities of daily living (ADLs). However, while clinical assessments involve observing multiple functional tasks, existing deep learning methods developed to analyze hands in egocentric video are only capable of considering single ADLs. This study presents a novel multi-video architecture that processes multiple task videos to make improved estimations about hand impairment. Methods: An egocentric video dataset of ADLs performed by stroke survivors in a home simulation lab was used to develop single and multi-input video models for binary impairment classification. Using SlowFast as a base feature extractor, late fusion (majority voting, fully-connected network) and intermediate fusion (concatenation, Markov chain) were investigated for building multi-video architectures. Results: Through evaluation with Leave-One-Participant-Out-Cross-Validation, using intermediate concatenation fusion to build multi-video models was found to achieve the best performance out of the fusion techniques. The resulting multi-video model for cropped inputs achieved an F1-score of $0.778\pm 0.129$ and significantly outperformed its single-video counterpart (F1-score of $0.696\pm 0.102$ ). Similarly, the multi-video model for full-frame inputs (F1-score of $0.796\pm 0.102$ ) significantly outperformed its single-video counterpart (F1-score of $0.708\pm 0.099$ ). Conclusion: Multi-video architectures are beneficial for estimating hand impairment from egocentric video after stroke. Significance: The proposed deep learning solution is the first of its kind in multi-video analysis, and opens the door to further applications in automating other multi-observation assessments for clinical use.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.60
自引率
8.20%
发文量
479
审稿时长
6-12 weeks
期刊介绍: Rehabilitative and neural aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation; and hardware and software applications for rehabilitation engineering and assistive devices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信