{"title":"CNN-LSTM Model for Recognizing Video-Recorded Actions Performed in a Traditional Chinese Exercise","authors":"Jing Chen;Jiping Wang;Qun Yuan;Zhao Yang","doi":"10.1109/JTEHM.2023.3282245","DOIUrl":null,"url":null,"abstract":"Identifying human actions from video data is an important problem in the fields of intelligent rehabilitation assessment. Motion feature extraction and pattern recognition are the two key procedures to achieve such goals. Traditional action recognition models are usually based on the geometric features manually extracted from video frames, which are however difficult to adapt to complex scenarios and cannot achieve high-precision recognition and robustness. We investigate a motion recognition model and apply it to recognize the sequence of complicated actions of a traditional Chinese exercise (ie, Baduanjin). We first developed a combined convolutional neural network (CNN) and long short-term memory (LSTM) model for recognizing the sequence of actions captured in video frames, and applied it to recognize the actions of Baduanjin. Moreover, this method has been compared with the traditional action recognition model based on geometric motion features in which Openpose is used to identify the joint positions in the skeletons. Its performance of high recognition accuracy has been verified on the testing video dataset, containing the video clips from 18 different practicers. The CNN-LSTM recognition model achieved 96.43% accuracy on the testing set; while those manually extracted features in the traditional action recognition model were only able to achieve 66.07% classification accuracy on the testing video dataset. The abstract image features extracted by the CNN module are more effective on improving the classification accuracy of the LSTM model. The proposed CNN-LSTM based method can be a useful tool in recognizing the complicated actions.","PeriodicalId":54255,"journal":{"name":"IEEE Journal of Translational Engineering in Health and Medicine-Jtehm","volume":"11 ","pages":"351-359"},"PeriodicalIF":3.7000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10143200","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Translational Engineering in Health and Medicine-Jtehm","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10143200/","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 1
Abstract
Identifying human actions from video data is an important problem in the fields of intelligent rehabilitation assessment. Motion feature extraction and pattern recognition are the two key procedures to achieve such goals. Traditional action recognition models are usually based on the geometric features manually extracted from video frames, which are however difficult to adapt to complex scenarios and cannot achieve high-precision recognition and robustness. We investigate a motion recognition model and apply it to recognize the sequence of complicated actions of a traditional Chinese exercise (ie, Baduanjin). We first developed a combined convolutional neural network (CNN) and long short-term memory (LSTM) model for recognizing the sequence of actions captured in video frames, and applied it to recognize the actions of Baduanjin. Moreover, this method has been compared with the traditional action recognition model based on geometric motion features in which Openpose is used to identify the joint positions in the skeletons. Its performance of high recognition accuracy has been verified on the testing video dataset, containing the video clips from 18 different practicers. The CNN-LSTM recognition model achieved 96.43% accuracy on the testing set; while those manually extracted features in the traditional action recognition model were only able to achieve 66.07% classification accuracy on the testing video dataset. The abstract image features extracted by the CNN module are more effective on improving the classification accuracy of the LSTM model. The proposed CNN-LSTM based method can be a useful tool in recognizing the complicated actions.
期刊介绍:
The IEEE Journal of Translational Engineering in Health and Medicine is an open access product that bridges the engineering and clinical worlds, focusing on detailed descriptions of advanced technical solutions to a clinical need along with clinical results and healthcare relevance. The journal provides a platform for state-of-the-art technology directions in the interdisciplinary field of biomedical engineering, embracing engineering, life sciences and medicine. A unique aspect of the journal is its ability to foster a collaboration between physicians and engineers for presenting broad and compelling real world technological and engineering solutions that can be implemented in the interest of improving quality of patient care and treatment outcomes, thereby reducing costs and improving efficiency. The journal provides an active forum for clinical research and relevant state-of the-art technology for members of all the IEEE societies that have an interest in biomedical engineering as well as reaching out directly to physicians and the medical community through the American Medical Association (AMA) and other clinical societies. The scope of the journal includes, but is not limited, to topics on: Medical devices, healthcare delivery systems, global healthcare initiatives, and ICT based services; Technological relevance to healthcare cost reduction; Technology affecting healthcare management, decision-making, and policy; Advanced technical work that is applied to solving specific clinical needs.