中风康复:亚秒动作识别的基准数据集。

Aakash Kaku, Kangning Liu, Avinash Parnandi, Haresh Rengaraj Rajamohan, Kannan Venkataramanan, Anita Venkatesan, Audre Wirtanen, Natasha Pandit, Heidi Schambra, Carlos Fernandez-Granda
{"title":"中风康复:亚秒动作识别的基准数据集。","authors":"Aakash Kaku,&nbsp;Kangning Liu,&nbsp;Avinash Parnandi,&nbsp;Haresh Rengaraj Rajamohan,&nbsp;Kannan Venkataramanan,&nbsp;Anita Venkatesan,&nbsp;Audre Wirtanen,&nbsp;Natasha Pandit,&nbsp;Heidi Schambra,&nbsp;Carlos Fernandez-Granda","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Automatic action identification from video and kinematic data is an important machine learning problem with applications ranging from robotics to smart health. Most existing works focus on identifying coarse actions such as running, climbing, or cutting vegetables, which have relatively long durations and a complex series of motions. This is an important limitation for applications that require identification of more elemental motions at high temporal resolution. For example, in the rehabilitation of arm impairment after stroke, quantifying the training dose (number of repetitions) requires differentiating motions with sub-second durations. Our goal is to bridge this gap. To this end, we introduce a large-scale, multimodal dataset, StrokeRehab, as a new action-recognition benchmark that includes elemental short-duration actions labeled at a high temporal resolution. StrokeRehab consists of high-quality inertial measurement unit sensor and video data of 51 stroke-impaired patients and 20 healthy subjects performing activities of daily living like feeding, brushing teeth, etc. Because it contains data from both healthy and impaired individuals, StrokeRehab can be used to study the influence of distribution shift in action-recognition tasks. When evaluated on StrokeRehab, current state-of-the-art models for action segmentation produce noisy predictions, which reduces their accuracy in identifying the corresponding sequence of actions. To address this, we propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques, which is based on a sequence-to-sequence model that directly predicts the sequence of actions. This approach outperforms current state-of-the-art methods on StrokeRehab, as well as on the standard benchmark datasets 50Salads, Breakfast, and Jigsaws.</p>","PeriodicalId":72099,"journal":{"name":"Advances in neural information processing systems","volume":"35 ","pages":"1671-1684"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530637/pdf/nihms-1931424.pdf","citationCount":"0","resultStr":"{\"title\":\"StrokeRehab: A Benchmark Dataset for Sub-second Action Identification.\",\"authors\":\"Aakash Kaku,&nbsp;Kangning Liu,&nbsp;Avinash Parnandi,&nbsp;Haresh Rengaraj Rajamohan,&nbsp;Kannan Venkataramanan,&nbsp;Anita Venkatesan,&nbsp;Audre Wirtanen,&nbsp;Natasha Pandit,&nbsp;Heidi Schambra,&nbsp;Carlos Fernandez-Granda\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Automatic action identification from video and kinematic data is an important machine learning problem with applications ranging from robotics to smart health. Most existing works focus on identifying coarse actions such as running, climbing, or cutting vegetables, which have relatively long durations and a complex series of motions. This is an important limitation for applications that require identification of more elemental motions at high temporal resolution. For example, in the rehabilitation of arm impairment after stroke, quantifying the training dose (number of repetitions) requires differentiating motions with sub-second durations. Our goal is to bridge this gap. To this end, we introduce a large-scale, multimodal dataset, StrokeRehab, as a new action-recognition benchmark that includes elemental short-duration actions labeled at a high temporal resolution. StrokeRehab consists of high-quality inertial measurement unit sensor and video data of 51 stroke-impaired patients and 20 healthy subjects performing activities of daily living like feeding, brushing teeth, etc. Because it contains data from both healthy and impaired individuals, StrokeRehab can be used to study the influence of distribution shift in action-recognition tasks. When evaluated on StrokeRehab, current state-of-the-art models for action segmentation produce noisy predictions, which reduces their accuracy in identifying the corresponding sequence of actions. To address this, we propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques, which is based on a sequence-to-sequence model that directly predicts the sequence of actions. This approach outperforms current state-of-the-art methods on StrokeRehab, as well as on the standard benchmark datasets 50Salads, Breakfast, and Jigsaws.</p>\",\"PeriodicalId\":72099,\"journal\":{\"name\":\"Advances in neural information processing systems\",\"volume\":\"35 \",\"pages\":\"1671-1684\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530637/pdf/nihms-1931424.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in neural information processing systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in neural information processing systems","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从视频和运动学数据中自动识别动作是一个重要的机器学习问题,其应用范围从机器人到智能健康。大多数现有的工作都集中在识别粗糙的动作,如跑步、攀爬或切菜,这些动作持续时间相对较长,动作系列复杂。对于需要以高时间分辨率识别更多元素运动的应用来说,这是一个重要的限制。例如,在中风后手臂损伤的康复中,量化训练剂量(重复次数)需要区分亚秒持续时间的运动。我们的目标是弥合这一差距。为此,我们引入了一个大规模的多模式数据集StrokeRehab,作为一个新的动作识别基准,它包括以高时间分辨率标记的基本短时间动作。StrokeRehab由高质量的惯性测量单元传感器和51名中风受损患者和20名健康受试者的视频数据组成,这些受试者正在进行日常生活活动,如喂食、刷牙等。由于它包含健康和受损个体的数据,StrokeReab可用于研究动作识别任务中分布变化的影响。当在StrokeRehab上进行评估时,当前最先进的动作分割模型会产生噪声预测,这降低了它们识别相应动作序列的准确性。为了解决这一问题,我们受语音识别技术的启发,提出了一种新的高分辨率动作识别方法,该方法基于直接预测动作序列的序列到序列模型。这种方法在StrokeRehab以及标准基准数据集50Salad、Breakfast和Jigsaw上的性能优于当前最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
StrokeRehab: A Benchmark Dataset for Sub-second Action Identification.

Automatic action identification from video and kinematic data is an important machine learning problem with applications ranging from robotics to smart health. Most existing works focus on identifying coarse actions such as running, climbing, or cutting vegetables, which have relatively long durations and a complex series of motions. This is an important limitation for applications that require identification of more elemental motions at high temporal resolution. For example, in the rehabilitation of arm impairment after stroke, quantifying the training dose (number of repetitions) requires differentiating motions with sub-second durations. Our goal is to bridge this gap. To this end, we introduce a large-scale, multimodal dataset, StrokeRehab, as a new action-recognition benchmark that includes elemental short-duration actions labeled at a high temporal resolution. StrokeRehab consists of high-quality inertial measurement unit sensor and video data of 51 stroke-impaired patients and 20 healthy subjects performing activities of daily living like feeding, brushing teeth, etc. Because it contains data from both healthy and impaired individuals, StrokeRehab can be used to study the influence of distribution shift in action-recognition tasks. When evaluated on StrokeRehab, current state-of-the-art models for action segmentation produce noisy predictions, which reduces their accuracy in identifying the corresponding sequence of actions. To address this, we propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques, which is based on a sequence-to-sequence model that directly predicts the sequence of actions. This approach outperforms current state-of-the-art methods on StrokeRehab, as well as on the standard benchmark datasets 50Salads, Breakfast, and Jigsaws.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信