Convolutional 3D Attention Network for Video Based Freezing of Gait Recognition

2018 Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2018-12-01 DOI:10.1109/DICTA.2018.8615791

Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis

{"title":"Convolutional 3D Attention Network for Video Based Freezing of Gait Recognition","authors":"Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis","doi":"10.1109/DICTA.2018.8615791","DOIUrl":null,"url":null,"abstract":"Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2018.8615791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.

查看原文本刊更多论文

基于视频冻结的卷积三维注意网络步态识别

步态冻结(FoG)被定义为尽管有行走的意图，但短暂的，间歇性的缺乏或脚向前进展的显着减少。它是帕金森病(PD)的典型症状，对PD患者的生活质量有显著影响。一般训练有素的专家需要检查病人的步态进行临床诊断，这是耗时和主观的。目前，从视频中自动识别FoG为解决这些问题提供了一个很有前途的解决方案，它将FoG识别制定为一个人类动作识别任务。然而，大多数现有的人类动作识别算法在这项任务中都受到限制，因为FoG非常微妙，当被无关运动干扰时很容易被忽略。在本文中，我们提出了一种新的动作识别算法，即卷积三维注意网络(C3DAN)，通过学习一个信息区域来解决这个问题，从而更有效地识别。该网络主要由两部分组成:空间注意网络(SAN)和三维卷积网络(C3D)。SAN的目标是生成一个由粗到细的注意区域，而C3D则是提取判别特征。我们提出的方法能够在不需要人工标注的情况下定位注意区域，并以端到端方式提取判别特征。我们在临床环境中收集了45名PD患者的视频数据集，以评估我们提出的C3DAN方法，用于量化PD中的FoG。我们获得的灵敏度为68.2%，特异性为80.8%，准确率为79.3%，优于几种最先进的人体动作识别方法。据我们所知，我们的工作是最早从临床视频中检测FoG的研究之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量