InBRwSANet：基于自关注的智能城市人类行为识别并行倒立剩余瓶颈架构。

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-05-27 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0322555

Yasir Khan Jadoon, Muhammad Attique Khan, Yasir Noman Khalid, Jamel Baili, Nebojsa Bacanin, MinKyung Hong, Yunyoung Nam

{"title":"InBRwSANet：基于自关注的智能城市人类行为识别并行倒立剩余瓶颈架构。","authors":"Yasir Khan Jadoon, Muhammad Attique Khan, Yasir Noman Khalid, Jamel Baili, Nebojsa Bacanin, MinKyung Hong, Yunyoung Nam","doi":"10.1371/journal.pone.0322555","DOIUrl":null,"url":null,"abstract":"Human Action Recognition (HAR) has grown significantly because of its many uses, including real-time surveillance and human-computer interaction. Various variations in routine human actions make the recognition process of action more difficult. In this paper, we proposed a novel deep learning architecture known as Inverted Bottleneck Residual with Self-Attention (InBRwSA). The proposed architecture is based on two different modules. In the first module, 6-parallel inverted bottleneck residual blocks are designed, and each block is connected with a skip connection. These blocks aim to learn complex human actions in many convolutional layers. After that, the second module is designed based on the self-attention mechanism. The learned weights of the first module are passed to self-attention, extract the most essential features, and can easily discriminate complex human actions. The proposed architecture is trained on the selected datasets, whereas the hyperparameters are chosen using the particle swarm optimization (PSO) algorithm. The trained model is employed in the testing phase for the feature extraction from the self-attention layer and passed to the shallow wide neural network classifier for the final classification. The HMDB51 and UCF 101 are frequently used as action recognition standard datasets. These datasets are chosen to allow for meaningful comparison with earlier research. UCF101 dataset has a wide range of activity classes, and HMDB51 has varied real-world behaviors. These features test the generalizability and flexibility of the presented model. Moreover, these datasets define the evaluation scope within a particular domain and guarantee relevance to real-world circumstances. The proposed technique is tested on both datasets, and accuracies of 78.80% and 91.80% were achieved, respectively. The ablation study demonstrated that a margin of error value of 70.1338 ± 3.053 (±4.35%) and 82.7813 ± 2.852 (±3.45%) for the confidence level 95%,1.960σx̄ is obtained for HMDB51 and UCF datasets respectively. The training time for the highest accuracy for HDMB51 and UCF101 is 134.09 and 252.10 seconds, respectively. The proposed architecture is compared with several pre-trained deep models and state-of-the-art (SOTA) existing techniques. Based on the results, the proposed architecture outperformed existing techniques.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0322555"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"InBRwSANet: Self-attention based parallel inverted residual bottleneck architecture for human action recognition in smart cities.\",\"authors\":\"Yasir Khan Jadoon, Muhammad Attique Khan, Yasir Noman Khalid, Jamel Baili, Nebojsa Bacanin, MinKyung Hong, Yunyoung Nam\",\"doi\":\"10.1371/journal.pone.0322555\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human Action Recognition (HAR) has grown significantly because of its many uses, including real-time surveillance and human-computer interaction. Various variations in routine human actions make the recognition process of action more difficult. In this paper, we proposed a novel deep learning architecture known as Inverted Bottleneck Residual with Self-Attention (InBRwSA). The proposed architecture is based on two different modules. In the first module, 6-parallel inverted bottleneck residual blocks are designed, and each block is connected with a skip connection. These blocks aim to learn complex human actions in many convolutional layers. After that, the second module is designed based on the self-attention mechanism. The learned weights of the first module are passed to self-attention, extract the most essential features, and can easily discriminate complex human actions. The proposed architecture is trained on the selected datasets, whereas the hyperparameters are chosen using the particle swarm optimization (PSO) algorithm. The trained model is employed in the testing phase for the feature extraction from the self-attention layer and passed to the shallow wide neural network classifier for the final classification. The HMDB51 and UCF 101 are frequently used as action recognition standard datasets. These datasets are chosen to allow for meaningful comparison with earlier research. UCF101 dataset has a wide range of activity classes, and HMDB51 has varied real-world behaviors. These features test the generalizability and flexibility of the presented model. Moreover, these datasets define the evaluation scope within a particular domain and guarantee relevance to real-world circumstances. The proposed technique is tested on both datasets, and accuracies of 78.80% and 91.80% were achieved, respectively. The ablation study demonstrated that a margin of error value of 70.1338 ± 3.053 (±4.35%) and 82.7813 ± 2.852 (±3.45%) for the confidence level 95%,1.960σx̄ is obtained for HMDB51 and UCF datasets respectively. The training time for the highest accuracy for HDMB51 and UCF101 is 134.09 and 252.10 seconds, respectively. The proposed architecture is compared with several pre-trained deep models and state-of-the-art (SOTA) existing techniques. Based on the results, the proposed architecture outperformed existing techniques.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 5\",\"pages\":\"e0322555\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0322555\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0322555","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

人类行为识别（HAR）由于其多种用途，包括实时监控和人机交互，已经得到了显著的发展。人类日常动作的各种变化使得动作的识别过程更加困难。在本文中，我们提出了一种新的深度学习架构，称为自注意的反向瓶颈残差（InBRwSA）。提出的体系结构基于两个不同的模块。在第一个模块中，设计了6个平行的倒瓶颈剩余块，每个块之间用跳跃式连接连接。这些块的目的是在许多卷积层中学习复杂的人类行为。然后，基于自关注机制设计第二模块。将第一个模块的学习权重传递给自关注，提取最基本的特征，可以很容易地区分复杂的人类行为。所提出的结构在选定的数据集上进行训练，而使用粒子群优化（PSO）算法选择超参数。训练后的模型在测试阶段用于自关注层的特征提取，并传递给浅宽神经网络分类器进行最终分类。HMDB51和UCF 101经常被用作动作识别标准数据集。选择这些数据集是为了与早期的研究进行有意义的比较。UCF101数据集具有广泛的活动类，而HMDB51具有各种现实世界的行为。这些特征测试了所提出模型的泛化性和灵活性。此外，这些数据集定义了特定领域内的评估范围，并保证了与现实世界环境的相关性。在两个数据集上进行了测试，准确率分别达到78.80%和91.80%。烧蚀研究表明，HMDB51和UCF数据集在95%置信水平下的误差范围分别为70.1338±3.053（±4.35%）和82.7813±2.852（±3.45%），分别为1.96 σx >。HDMB51和UCF101的最高准确率训练时间分别为134.09秒和252.10秒。将提出的体系结构与几种预训练的深度模型和最先进的现有技术进行了比较。基于结果，所提出的体系结构优于现有的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

InBRwSANet: Self-attention based parallel inverted residual bottleneck architecture for human action recognition in smart cities.

Human Action Recognition (HAR) has grown significantly because of its many uses, including real-time surveillance and human-computer interaction. Various variations in routine human actions make the recognition process of action more difficult. In this paper, we proposed a novel deep learning architecture known as Inverted Bottleneck Residual with Self-Attention (InBRwSA). The proposed architecture is based on two different modules. In the first module, 6-parallel inverted bottleneck residual blocks are designed, and each block is connected with a skip connection. These blocks aim to learn complex human actions in many convolutional layers. After that, the second module is designed based on the self-attention mechanism. The learned weights of the first module are passed to self-attention, extract the most essential features, and can easily discriminate complex human actions. The proposed architecture is trained on the selected datasets, whereas the hyperparameters are chosen using the particle swarm optimization (PSO) algorithm. The trained model is employed in the testing phase for the feature extraction from the self-attention layer and passed to the shallow wide neural network classifier for the final classification. The HMDB51 and UCF 101 are frequently used as action recognition standard datasets. These datasets are chosen to allow for meaningful comparison with earlier research. UCF101 dataset has a wide range of activity classes, and HMDB51 has varied real-world behaviors. These features test the generalizability and flexibility of the presented model. Moreover, these datasets define the evaluation scope within a particular domain and guarantee relevance to real-world circumstances. The proposed technique is tested on both datasets, and accuracies of 78.80% and 91.80% were achieved, respectively. The ablation study demonstrated that a margin of error value of 70.1338 ± 3.053 (±4.35%) and 82.7813 ± 2.852 (±3.45%) for the confidence level 95%,1.960σx̄ is obtained for HMDB51 and UCF datasets respectively. The training time for the highest accuracy for HDMB51 and UCF101 is 134.09 and 252.10 seconds, respectively. The proposed architecture is compared with several pre-trained deep models and state-of-the-art (SOTA) existing techniques. Based on the results, the proposed architecture outperformed existing techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage