Advancements in Real-Time Human Activity Recognition via Innovative Fusion of 3DCNN and ConvLSTM Models

Journal of Machine and Computing Pub Date : 2024-07-05 DOI:10.53759/7669/jmc202404071

Roopa R, Humera Khanam M

{"title":"Advancements in Real-Time Human Activity Recognition via Innovative Fusion of 3DCNN and ConvLSTM Models","authors":"Roopa R, Humera Khanam M","doi":"10.53759/7669/jmc202404071","DOIUrl":null,"url":null,"abstract":"Object detection (OD) is a computer vision procedure for locating objects in digital images. Our study examines the crucial need for robust OD algorithms in human activity recognition, a vital domain spanning human-computer interaction, sports analysis, and surveillance. Nowadays, three-dimensional convolutional neural networks (3DCNNs) are a standard method for recognizing human activity. Utilizing recent advances in Deep Learning (DL), we present a novel framework designed to create a fusion model that enhances conventional methods at integrates three-dimensional convolutional neural networks (3DCNNs) with Convolutional Long-Short-Term Memory (ConvLSTM) layers. Our proposed model focuses on utilizing the spatiotemporal features innately present in video streams. An important aspect often missed in existing OD methods. We assess the efficacy of our proposed architecture employing the UCF-50 dataset, which is well-known for its different range of human activities. In addition to designing a novel deep-learning architecture, we used data augmentation techniques that expand the dataset, improve model robustness, reduce overfitting, extend dataset size, and enhance performance on imbalanced data. The proposed model demonstrated outstanding performance through comprehensive experimentation, achieving an impressive accuracy of 98.11% in classifying human activity. Furthermore, when benchmarked against state-of-the-art methods, our system provides adequate accuracy and class average for 50 activity categories.","PeriodicalId":516151,"journal":{"name":"Journal of Machine and Computing","volume":" 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53759/7669/jmc202404071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Object detection (OD) is a computer vision procedure for locating objects in digital images. Our study examines the crucial need for robust OD algorithms in human activity recognition, a vital domain spanning human-computer interaction, sports analysis, and surveillance. Nowadays, three-dimensional convolutional neural networks (3DCNNs) are a standard method for recognizing human activity. Utilizing recent advances in Deep Learning (DL), we present a novel framework designed to create a fusion model that enhances conventional methods at integrates three-dimensional convolutional neural networks (3DCNNs) with Convolutional Long-Short-Term Memory (ConvLSTM) layers. Our proposed model focuses on utilizing the spatiotemporal features innately present in video streams. An important aspect often missed in existing OD methods. We assess the efficacy of our proposed architecture employing the UCF-50 dataset, which is well-known for its different range of human activities. In addition to designing a novel deep-learning architecture, we used data augmentation techniques that expand the dataset, improve model robustness, reduce overfitting, extend dataset size, and enhance performance on imbalanced data. The proposed model demonstrated outstanding performance through comprehensive experimentation, achieving an impressive accuracy of 98.11% in classifying human activity. Furthermore, when benchmarked against state-of-the-art methods, our system provides adequate accuracy and class average for 50 activity categories.

查看原文本刊更多论文

通过创新性融合 3DCNN 和 ConvLSTM 模型推进人类活动的实时识别

物体检测（OD）是一种在数字图像中定位物体的计算机视觉程序。人类活动识别是人机交互、体育分析和监控的重要领域，我们的研究探讨了在人类活动识别中对鲁棒性 OD 算法的关键需求。如今，三维卷积神经网络（3DCNN）已成为识别人类活动的标准方法。利用深度学习（Deep Learning，DL）的最新进展，我们提出了一个新颖的框架，旨在创建一个融合模型，将三维卷积神经网络（3DCNN）与卷积长短期记忆（ConvLSTM）层整合在一起，从而增强传统方法。我们提出的模型侧重于利用视频流中固有的时空特征。这是现有 OD 方法经常忽略的一个重要方面。我们利用 UCF-50 数据集评估了我们所提议的架构的功效，该数据集以其不同范围的人类活动而闻名。除了设计新颖的深度学习架构外，我们还使用了数据增强技术来扩展数据集、提高模型的鲁棒性、减少过拟合、扩大数据集规模并提高在不平衡数据上的性能。通过全面的实验，所提出的模型表现出了卓越的性能，在对人类活动进行分类时达到了令人印象深刻的 98.11% 的准确率。此外，在与最先进的方法进行比较时，我们的系统为 50 个活动类别提供了足够的准确率和类平均值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Machine and Computing

CiteScore

1.80

自引率

0.00%

发文量