{"title":"On Mobile Pose Estimation and Action Recognition Design and Implementation","authors":"M. Aslanyan","doi":"10.1134/s1054661824010036","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>Human pose estimation (PE, tracking body pose on-the-go) is a computer vision-based technology that identifies and controls specific points on the human body. These points represent our joints and special points over the body determining the sizes, distances, angle of flexion, and type of the motion. Knowing this in a specific exercise is the basis of work for rehabilitation and physiotherapy, fitness and self-coaching, augmented reality, animation and gaming, robot management, surveillance and human activity analysis. Implementing such capabilities may use special suits or sensor arrays to achieve the best result, but massive use of PE is related to devices that many users own: namely smartphones, smartwatches, and earbuds. The body pose estimation system starts with capturing the initial data. In dealing with motion detection, it is necessary to analyze a sequence of images rather than a still photo. Different software modules are responsible for tracking 2D key points, creating a body representation, and converting it into a 3D space. Action recognition on the other hand is a way to analyze the sequence of estimated pose data with the aim to categorize sequence under the classes. It is widely used various fields. One of the widely known use cases is analysis and detection of potential attacks of illegal action using video from the surveillance cameras. Another use case involves analysis of the sequence of pose with the aim of creating a virtual coaching environment. Specifically, our research will target this challenging issue and aim to create this environment for mobile devices. We will describe some of the solutions that are suitable for effectively pose estimation and action recognition on mobile devices. We will show how lightweight models based on convolution neural networks can be used to efficiently solve pose estimation issue and address action recognition problem with the dynamic time warping algorithm.</p>","PeriodicalId":35400,"journal":{"name":"PATTERN RECOGNITION AND IMAGE ANALYSIS","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PATTERN RECOGNITION AND IMAGE ANALYSIS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1134/s1054661824010036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Human pose estimation (PE, tracking body pose on-the-go) is a computer vision-based technology that identifies and controls specific points on the human body. These points represent our joints and special points over the body determining the sizes, distances, angle of flexion, and type of the motion. Knowing this in a specific exercise is the basis of work for rehabilitation and physiotherapy, fitness and self-coaching, augmented reality, animation and gaming, robot management, surveillance and human activity analysis. Implementing such capabilities may use special suits or sensor arrays to achieve the best result, but massive use of PE is related to devices that many users own: namely smartphones, smartwatches, and earbuds. The body pose estimation system starts with capturing the initial data. In dealing with motion detection, it is necessary to analyze a sequence of images rather than a still photo. Different software modules are responsible for tracking 2D key points, creating a body representation, and converting it into a 3D space. Action recognition on the other hand is a way to analyze the sequence of estimated pose data with the aim to categorize sequence under the classes. It is widely used various fields. One of the widely known use cases is analysis and detection of potential attacks of illegal action using video from the surveillance cameras. Another use case involves analysis of the sequence of pose with the aim of creating a virtual coaching environment. Specifically, our research will target this challenging issue and aim to create this environment for mobile devices. We will describe some of the solutions that are suitable for effectively pose estimation and action recognition on mobile devices. We will show how lightweight models based on convolution neural networks can be used to efficiently solve pose estimation issue and address action recognition problem with the dynamic time warping algorithm.
摘要 人体姿态估计(PE,随身追踪人体姿态)是一种基于计算机视觉的技术,可识别和控制人体上的特定点。这些点代表我们的关节和身体上的特殊点,决定着运动的大小、距离、弯曲角度和类型。在特定运动中了解这些点是康复和理疗、健身和自我教练、增强现实、动画和游戏、机器人管理、监控和人体活动分析等工作的基础。要实现这些功能,可能需要使用特殊的服装或传感器阵列来达到最佳效果,但 PE 的大量使用与许多用户拥有的设备有关:即智能手机、智能手表和耳塞。身体姿态估计系统从捕捉初始数据开始。在处理运动检测时,有必要分析一系列图像而不是静态照片。不同的软件模块负责跟踪二维关键点、创建身体表征并将其转换为三维空间。另一方面,动作识别是一种分析估计姿势数据序列的方法,目的是将序列归类。它被广泛应用于各个领域。其中一个广为人知的用例是利用监控摄像头的视频分析和检测潜在的非法行为攻击。另一个用例涉及姿势序列分析,目的是创建虚拟教练环境。具体来说,我们的研究将以这一具有挑战性的问题为目标,旨在为移动设备创建这一环境。我们将介绍一些适合在移动设备上有效进行姿势估计和动作识别的解决方案。我们将展示如何利用基于卷积神经网络的轻量级模型来有效解决姿势估计问题,并利用动态时间扭曲算法来解决动作识别问题。
期刊介绍:
The purpose of the journal is to publish high-quality peer-reviewed scientific and technical materials that present the results of fundamental and applied scientific research in the field of image processing, recognition, analysis and understanding, pattern recognition, artificial intelligence, and related fields of theoretical and applied computer science and applied mathematics. The policy of the journal provides for the rapid publication of original scientific articles, analytical reviews, articles of the world''s leading scientists and specialists on the subject of the journal solicited by the editorial board, special thematic issues, proceedings of the world''s leading scientific conferences and seminars, as well as short reports containing new results of fundamental and applied research in the field of mathematical theory and methodology of image analysis, mathematical theory and methodology of image recognition, and mathematical foundations and methodology of artificial intelligence. The journal also publishes articles on the use of the apparatus and methods of the mathematical theory of image analysis and the mathematical theory of image recognition for the development of new information technologies and their supporting software and algorithmic complexes and systems for solving complex and particularly important applied problems. The main scientific areas are the mathematical theory of image analysis and the mathematical theory of pattern recognition. The journal also embraces the problems of analyzing and evaluating poorly formalized, poorly structured, incomplete, contradictory and noisy information, including artificial intelligence, bioinformatics, medical informatics, data mining, big data analysis, machine vision, data representation and modeling, data and knowledge extraction from images, machine learning, forecasting, machine graphics, databases, knowledge bases, medical and technical diagnostics, neural networks, specialized software, specialized computational architectures for information analysis and evaluation, linguistic, psychological, psychophysical, and physiological aspects of image analysis and pattern recognition, applied problems, and related problems. Articles can be submitted either in English or Russian. The English language is preferable. Pattern Recognition and Image Analysis is a hybrid journal that publishes mostly subscription articles that are free of charge for the authors, but also accepts Open Access articles with article processing charges. The journal is one of the top 10 global periodicals on image analysis and pattern recognition and is the only publication on this topic in the Russian Federation, Central and Eastern Europe.