Yasir Khan Jadoon, Muhammad Attique Khan, Yasir Noman Khalid, Jamel Baili, Nebojsa Bacanin, MinKyung Hong, Yunyoung Nam
{"title":"InBRwSANet:基于自关注的智能城市人类行为识别并行倒立剩余瓶颈架构。","authors":"Yasir Khan Jadoon, Muhammad Attique Khan, Yasir Noman Khalid, Jamel Baili, Nebojsa Bacanin, MinKyung Hong, Yunyoung Nam","doi":"10.1371/journal.pone.0322555","DOIUrl":null,"url":null,"abstract":"<p><p>Human Action Recognition (HAR) has grown significantly because of its many uses, including real-time surveillance and human-computer interaction. Various variations in routine human actions make the recognition process of action more difficult. In this paper, we proposed a novel deep learning architecture known as Inverted Bottleneck Residual with Self-Attention (InBRwSA). The proposed architecture is based on two different modules. In the first module, 6-parallel inverted bottleneck residual blocks are designed, and each block is connected with a skip connection. These blocks aim to learn complex human actions in many convolutional layers. After that, the second module is designed based on the self-attention mechanism. The learned weights of the first module are passed to self-attention, extract the most essential features, and can easily discriminate complex human actions. The proposed architecture is trained on the selected datasets, whereas the hyperparameters are chosen using the particle swarm optimization (PSO) algorithm. The trained model is employed in the testing phase for the feature extraction from the self-attention layer and passed to the shallow wide neural network classifier for the final classification. The HMDB51 and UCF 101 are frequently used as action recognition standard datasets. These datasets are chosen to allow for meaningful comparison with earlier research. UCF101 dataset has a wide range of activity classes, and HMDB51 has varied real-world behaviors. These features test the generalizability and flexibility of the presented model. Moreover, these datasets define the evaluation scope within a particular domain and guarantee relevance to real-world circumstances. The proposed technique is tested on both datasets, and accuracies of 78.80% and 91.80% were achieved, respectively. The ablation study demonstrated that a margin of error value of 70.1338 ± 3.053 (±4.35%) and 82.7813 ± 2.852 (±3.45%) for the confidence level 95%,1.960σx̄ is obtained for HMDB51 and UCF datasets respectively. The training time for the highest accuracy for HDMB51 and UCF101 is 134.09 and 252.10 seconds, respectively. The proposed architecture is compared with several pre-trained deep models and state-of-the-art (SOTA) existing techniques. Based on the results, the proposed architecture outperformed existing techniques.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 5","pages":"e0322555"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"InBRwSANet: Self-attention based parallel inverted residual bottleneck architecture for human action recognition in smart cities.\",\"authors\":\"Yasir Khan Jadoon, Muhammad Attique Khan, Yasir Noman Khalid, Jamel Baili, Nebojsa Bacanin, MinKyung Hong, Yunyoung Nam\",\"doi\":\"10.1371/journal.pone.0322555\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Human Action Recognition (HAR) has grown significantly because of its many uses, including real-time surveillance and human-computer interaction. Various variations in routine human actions make the recognition process of action more difficult. In this paper, we proposed a novel deep learning architecture known as Inverted Bottleneck Residual with Self-Attention (InBRwSA). The proposed architecture is based on two different modules. In the first module, 6-parallel inverted bottleneck residual blocks are designed, and each block is connected with a skip connection. These blocks aim to learn complex human actions in many convolutional layers. After that, the second module is designed based on the self-attention mechanism. The learned weights of the first module are passed to self-attention, extract the most essential features, and can easily discriminate complex human actions. The proposed architecture is trained on the selected datasets, whereas the hyperparameters are chosen using the particle swarm optimization (PSO) algorithm. The trained model is employed in the testing phase for the feature extraction from the self-attention layer and passed to the shallow wide neural network classifier for the final classification. The HMDB51 and UCF 101 are frequently used as action recognition standard datasets. These datasets are chosen to allow for meaningful comparison with earlier research. UCF101 dataset has a wide range of activity classes, and HMDB51 has varied real-world behaviors. These features test the generalizability and flexibility of the presented model. Moreover, these datasets define the evaluation scope within a particular domain and guarantee relevance to real-world circumstances. The proposed technique is tested on both datasets, and accuracies of 78.80% and 91.80% were achieved, respectively. The ablation study demonstrated that a margin of error value of 70.1338 ± 3.053 (±4.35%) and 82.7813 ± 2.852 (±3.45%) for the confidence level 95%,1.960σx̄ is obtained for HMDB51 and UCF datasets respectively. The training time for the highest accuracy for HDMB51 and UCF101 is 134.09 and 252.10 seconds, respectively. The proposed architecture is compared with several pre-trained deep models and state-of-the-art (SOTA) existing techniques. Based on the results, the proposed architecture outperformed existing techniques.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 5\",\"pages\":\"e0322555\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0322555\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0322555","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
InBRwSANet: Self-attention based parallel inverted residual bottleneck architecture for human action recognition in smart cities.
Human Action Recognition (HAR) has grown significantly because of its many uses, including real-time surveillance and human-computer interaction. Various variations in routine human actions make the recognition process of action more difficult. In this paper, we proposed a novel deep learning architecture known as Inverted Bottleneck Residual with Self-Attention (InBRwSA). The proposed architecture is based on two different modules. In the first module, 6-parallel inverted bottleneck residual blocks are designed, and each block is connected with a skip connection. These blocks aim to learn complex human actions in many convolutional layers. After that, the second module is designed based on the self-attention mechanism. The learned weights of the first module are passed to self-attention, extract the most essential features, and can easily discriminate complex human actions. The proposed architecture is trained on the selected datasets, whereas the hyperparameters are chosen using the particle swarm optimization (PSO) algorithm. The trained model is employed in the testing phase for the feature extraction from the self-attention layer and passed to the shallow wide neural network classifier for the final classification. The HMDB51 and UCF 101 are frequently used as action recognition standard datasets. These datasets are chosen to allow for meaningful comparison with earlier research. UCF101 dataset has a wide range of activity classes, and HMDB51 has varied real-world behaviors. These features test the generalizability and flexibility of the presented model. Moreover, these datasets define the evaluation scope within a particular domain and guarantee relevance to real-world circumstances. The proposed technique is tested on both datasets, and accuracies of 78.80% and 91.80% were achieved, respectively. The ablation study demonstrated that a margin of error value of 70.1338 ± 3.053 (±4.35%) and 82.7813 ± 2.852 (±3.45%) for the confidence level 95%,1.960σx̄ is obtained for HMDB51 and UCF datasets respectively. The training time for the highest accuracy for HDMB51 and UCF101 is 134.09 and 252.10 seconds, respectively. The proposed architecture is compared with several pre-trained deep models and state-of-the-art (SOTA) existing techniques. Based on the results, the proposed architecture outperformed existing techniques.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage