AI-powered real-time analysis of human activity in videos via smartphone.

Cognitive Computing and Internet of Things Pub Date : 1900-01-01 DOI:10.54941/ahfe1003972

Rico Thomanek, Benny Platte, M. Baumgart, Christian Roschke, M. Ritter

{"title":"AI-powered real-time analysis of human activity in videos via smartphone.","authors":"Rico Thomanek, Benny Platte, M. Baumgart, Christian Roschke, M. Ritter","doi":"10.54941/ahfe1003972","DOIUrl":null,"url":null,"abstract":"A major focus in computer vision research is the recognition of human activity based on visual information from audiovisual data using artificial intelligence. In this context, researchers are currently exploring image-based approaches using 3D CNNs, RNNs, or hybrid models with the intent of learning multiple levels of representation and abstraction that enable fully automated feature extraction and activity analysis based on them. Unfortunately, these architectures require powerful hardware to achieve the most real-time processing possible, making them difficult to deploy on smartphones. However, many video recordings are increasingly made with smartphones, so immediate classification of performed human activities and their tagging already during video recording would be useful for a variety of use cases. Especially in the mobile environment, a wide variety of use cases are therefore conceivable, such as the detection of correct motion sequences in the sports and health sector or the monitoring and automated alerting of security-relevant environments (e.g., demonstrations, festivals). However, this requires an efficient system architecture to perform real-time analysis despite limited hardware power. This paper addresses the approach of skeleton-based activity recognition on smartphones, where motion vectors of detected skeleton points are analyzed for their spatial and temporal expression rather than pixel-based information. In this process, the 3D-bone points of a recognized person are extracted using the AR framework integrated in the operating system and their motion data is analyzed in real time using a self-trained RNN. This purely numerical approach enables time-efficient real-time processing and activity classification. This system makes it possible to recognize a person in a live video stream recorded with a smartphone and classify the activity performed. By successfully deploying the system in several field tests, it can be shown both that the described approach works in principle and that it can be transferred to a resource-constrained mobile environment.","PeriodicalId":285612,"journal":{"name":"Cognitive Computing and Internet of Things","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computing and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54941/ahfe1003972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A major focus in computer vision research is the recognition of human activity based on visual information from audiovisual data using artificial intelligence. In this context, researchers are currently exploring image-based approaches using 3D CNNs, RNNs, or hybrid models with the intent of learning multiple levels of representation and abstraction that enable fully automated feature extraction and activity analysis based on them. Unfortunately, these architectures require powerful hardware to achieve the most real-time processing possible, making them difficult to deploy on smartphones. However, many video recordings are increasingly made with smartphones, so immediate classification of performed human activities and their tagging already during video recording would be useful for a variety of use cases. Especially in the mobile environment, a wide variety of use cases are therefore conceivable, such as the detection of correct motion sequences in the sports and health sector or the monitoring and automated alerting of security-relevant environments (e.g., demonstrations, festivals). However, this requires an efficient system architecture to perform real-time analysis despite limited hardware power. This paper addresses the approach of skeleton-based activity recognition on smartphones, where motion vectors of detected skeleton points are analyzed for their spatial and temporal expression rather than pixel-based information. In this process, the 3D-bone points of a recognized person are extracted using the AR framework integrated in the operating system and their motion data is analyzed in real time using a self-trained RNN. This purely numerical approach enables time-efficient real-time processing and activity classification. This system makes it possible to recognize a person in a live video stream recorded with a smartphone and classify the activity performed. By successfully deploying the system in several field tests, it can be shown both that the described approach works in principle and that it can be transferred to a resource-constrained mobile environment.

查看原文本刊更多论文

通过智能手机对视频中的人类活动进行人工智能实时分析。

计算机视觉研究的一个主要焦点是利用人工智能从视听数据中获取视觉信息来识别人类活动。在这种情况下，研究人员目前正在探索使用3D cnn、rnn或混合模型的基于图像的方法，目的是学习多层表示和抽象，从而实现基于它们的全自动特征提取和活动分析。不幸的是，这些架构需要强大的硬件来实现尽可能多的实时处理，这使得它们很难部署在智能手机上。然而，越来越多的视频录制都是用智能手机完成的，因此在视频录制过程中对人类活动进行即时分类和标记将对各种用例都很有用。特别是在移动环境中，因此可以想象各种各样的用例，例如在体育和卫生部门检测正确的运动序列，或对安全相关环境(例如，演示、节日)进行监控和自动警报。然而，这需要一个高效的系统架构来执行实时分析，尽管硬件能力有限。本文讨论了智能手机上基于骨骼的活动识别方法，其中对检测到的骨骼点的运动向量进行空间和时间表达分析，而不是基于像素的信息。在这个过程中，使用集成在操作系统中的AR框架提取被识别人的3d骨骼点，并使用自训练的RNN实时分析其运动数据。这种纯粹的数值方法可以实现高效的实时处理和活动分类。该系统可以在智能手机录制的实时视频流中识别人物，并对所执行的活动进行分类。通过在几个现场测试中成功部署该系统，可以表明所描述的方法在原则上是有效的，并且可以转移到资源受限的移动环境中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Computing and Internet of Things

自引率

0.00%

发文量