Machine learning for automating subjective clinical assessment of gait impairment in people with acquired brain injury - a comparison of an image extraction and classification system to expert scoring.

IF 5.2 2区医学 Q1 ENGINEERING, BIOMEDICAL

Journal of NeuroEngineering and Rehabilitation Pub Date : 2024-07-23 DOI:10.1186/s12984-024-01406-w

Ashleigh Mobbs, Michelle Kahn, Gavin Williams, Benjamin F Mentiplay, Yong-Hao Pua, Ross A Clark

{"title":"Machine learning for automating subjective clinical assessment of gait impairment in people with acquired brain injury - a comparison of an image extraction and classification system to expert scoring.","authors":"Ashleigh Mobbs, Michelle Kahn, Gavin Williams, Benjamin F Mentiplay, Yong-Hao Pua, Ross A Clark","doi":"10.1186/s12984-024-01406-w","DOIUrl":null,"url":null,"abstract":"Background: Walking impairment is a common disability post acquired brain injury (ABI), with visually evident arm movement abnormality identified as negatively impacting a multitude of psychological factors. The International Classification of Functioning, Disability and Health (ICF) qualifiers scale has been used to subjectively assess arm movement abnormality, showing strong intra-rater and test-retest reliability, however, only moderate inter-rater reliability. This impacts clinical utility, limiting its use as a measurement tool. To both automate the analysis and overcome these errors, the primary aim of this study was to evaluate the ability of a novel two-level machine learning model to assess arm movement abnormality during walking in people with ABI.Methods: Frontal plane gait videos were used to train four networks with 50%, 75%, 90%, and 100% of participants (ABI: n = 42, healthy controls: n = 34) to automatically identify anatomical landmarks using DeepLabCut™ and calculate two-dimensional kinematic joint angles. Assessment scores from three experienced neurorehabilitation clinicians were used with these joint angles to train random forest networks with nested cross-validation to predict assessor scores for all videos. Agreement between unseen participant (i.e. test group participants that were not used to train the model) predictions and each individual assessor's scores were compared using quadratic weighted kappa. One sample t-tests (to determine over/underprediction against clinician ratings) and one-way ANOVA (to determine differences between networks) were applied to the four networks.Results: The machine learning predictions have similar agreement to experienced human assessors, with no statistically significant (p < 0.05) difference for any match contingency. There was no statistically significant difference between the predictions from the four networks (F = 0.119; p = 0.949). The four networks did however under-predict scores with small effect sizes (p range = 0.007 to 0.040; Cohen's d range = 0.156 to 0.217).Conclusions: This study demonstrated that machine learning can perform similarly to experienced clinicians when subjectively assessing arm movement abnormality in people with ABI. The relatively small sample size may have resulted in under-prediction of some scores, albeit with small effect sizes. Studies with larger sample sizes that objectively and automatically assess dynamic movement in both local and telerehabilitation assessments, for example using smartphones and edge-based machine learning, to reduce measurement error and healthcare access inequality are needed.","PeriodicalId":16384,"journal":{"name":"Journal of NeuroEngineering and Rehabilitation","volume":"21 1","pages":"124"},"PeriodicalIF":5.2000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264460/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of NeuroEngineering and Rehabilitation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s12984-024-01406-w","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Walking impairment is a common disability post acquired brain injury (ABI), with visually evident arm movement abnormality identified as negatively impacting a multitude of psychological factors. The International Classification of Functioning, Disability and Health (ICF) qualifiers scale has been used to subjectively assess arm movement abnormality, showing strong intra-rater and test-retest reliability, however, only moderate inter-rater reliability. This impacts clinical utility, limiting its use as a measurement tool. To both automate the analysis and overcome these errors, the primary aim of this study was to evaluate the ability of a novel two-level machine learning model to assess arm movement abnormality during walking in people with ABI.

Methods: Frontal plane gait videos were used to train four networks with 50%, 75%, 90%, and 100% of participants (ABI: n = 42, healthy controls: n = 34) to automatically identify anatomical landmarks using DeepLabCut^™ and calculate two-dimensional kinematic joint angles. Assessment scores from three experienced neurorehabilitation clinicians were used with these joint angles to train random forest networks with nested cross-validation to predict assessor scores for all videos. Agreement between unseen participant (i.e. test group participants that were not used to train the model) predictions and each individual assessor's scores were compared using quadratic weighted kappa. One sample t-tests (to determine over/underprediction against clinician ratings) and one-way ANOVA (to determine differences between networks) were applied to the four networks.

Results: The machine learning predictions have similar agreement to experienced human assessors, with no statistically significant (p < 0.05) difference for any match contingency. There was no statistically significant difference between the predictions from the four networks (F = 0.119; p = 0.949). The four networks did however under-predict scores with small effect sizes (p range = 0.007 to 0.040; Cohen's d range = 0.156 to 0.217).

Conclusions: This study demonstrated that machine learning can perform similarly to experienced clinicians when subjectively assessing arm movement abnormality in people with ABI. The relatively small sample size may have resulted in under-prediction of some scores, albeit with small effect sizes. Studies with larger sample sizes that objectively and automatically assess dynamic movement in both local and telerehabilitation assessments, for example using smartphones and edge-based machine learning, to reduce measurement error and healthcare access inequality are needed.

查看原文本刊更多论文

用于后天性脑损伤患者步态障碍主观临床评估自动化的机器学习--图像提取和分类系统与专家评分的比较。

背景：行走障碍是后天性脑损伤（ABI）后常见的残疾，视觉上明显的手臂运动异常被认为会对多种心理因素产生负面影响。国际功能、残疾和健康分类（ICF）定性量表已被用于主观评估手臂运动异常，显示出较强的评分者内部可靠性和测试-再测试可靠性，但评分者之间的可靠性仅为中等。这影响了临床实用性，限制了其作为测量工具的使用。为了实现分析自动化并克服这些误差，本研究的主要目的是评估一种新型两级机器学习模型评估 ABI 患者行走时手臂运动异常的能力：方法：使用额面步态视频训练四个网络，分别训练 50%、75%、90% 和 100% 的参与者（ABI：n = 42，健康对照组：n = 34），以使用 DeepLabCut™ 自动识别解剖地标并计算二维运动关节角度。三位经验丰富的神经康复临床医生的评估分数与这些关节角度一起用于训练随机森林网络，并通过嵌套交叉验证来预测所有视频的评估分数。使用二次加权卡帕法比较了未见参与者（即未用于训练模型的测试组参与者）的预测结果与每个评估者的评分之间的一致性。对四个网络进行了单样本 t 检验（以确定与临床医生评分相比预测过高/过低）和单因素方差分析（以确定网络之间的差异）：结果：机器学习的预测结果与经验丰富的人类评估师的预测结果具有相似的一致性，没有显著的统计学意义（p 结论：机器学习的预测结果与经验丰富的人类评估师的预测结果具有相似的一致性，没有显著的统计学意义：本研究表明，在主观评估 ABI 患者手臂运动异常时，机器学习的表现与经验丰富的临床医生相似。样本量相对较小，可能会导致对某些评分的预测不足，尽管影响大小较小。有必要进行样本量更大的研究，在本地和远程康复评估中客观、自动地评估动态运动，例如使用智能手机和基于边缘的机器学习，以减少测量误差和医疗服务的不平等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of NeuroEngineering and Rehabilitation 工程技术-工程：生物医学

CiteScore

9.60

自引率

3.90%

发文量

122

审稿时长

24 months

期刊介绍： Journal of NeuroEngineering and Rehabilitation considers manuscripts on all aspects of research that result from cross-fertilization of the fields of neuroscience, biomedical engineering, and physical medicine & rehabilitation.