{"title":"Convolutional 3D Attention Network for Video Based Freezing of Gait Recognition","authors":"Renfei Sun, Zhiyong Wang, K. E. Martens, S. Lewis","doi":"10.1109/DICTA.2018.8615791","DOIUrl":null,"url":null,"abstract":"Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2018.8615791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Freezing of gait (FoG) is defined as a brief, episodic absence or marked reduction of forward progression of the feet despite the intention to walk. It is a typical symptom of Parkinson's disease (PD) and has a significant impact on the life quality of PD patients. Generally trained experts need to review the gait of a patient for clinical diagnosis, which is time consuming and subjective. Nowadays, automatic FoG identification from videos provides a promising solution to address these issues by formulating FoG identification as a human action recognition task. However, most existing human action recognition algorithms are limited in this task as FoG is very subtle and can be easily overlooked when being interfered with by irrelevant motion. In this paper, we propose a novel action recognition algorithm, namely convolutional 3D attention network (C3DAN), to address this issue by learning an informative region for more effective recognition. The network consists of two main parts: Spatial Attention Network (SAN) and 3-dimensional convolutional network (C3D). SAN aims to generate an attention region from coarse to fine, while C3D extracts discriminative features. Our proposed approach is able to localize attention region without manual annotation and to extract discriminative features in an end-to-end way. We evaluate our proposed C3DAN method on a video dataset collected from 45 PD patients in a clinical setting for the quantification of FoG in PD. We obtained sensitivity of 68.2%, specificity of 80.8% and accuracy of 79.3%, which outperformed several state-of-the-art human action recognition methods. To the best of our knowledge, our work is one of the first studies detecting FoG from clinical videos.