Fahad Ul Hassan Asif Mattoo, U. S. Khan, Tahir Nawaz, N. Rashid
{"title":"Deep Learning-based Feature Fusion for Action Recognition Using Skeleton Information","authors":"Fahad Ul Hassan Asif Mattoo, U. S. Khan, Tahir Nawaz, N. Rashid","doi":"10.1109/ICRAI57502.2023.10089577","DOIUrl":null,"url":null,"abstract":"Various action recognition systems have been proposed, but most of them are not feasible to be used in real-time applications. Skeleton-based action recognition has a low computational cost and is not affected by background changes. As the pose estimation models are becoming faster (almost real-time), a model was created with only 1.8M parameters named DD-net, which uses the skeleton information to predict the action. Recently an improved version of the model came out and was named TD-net. The model is very rich with geometric-based features but lacks motion-based features. To overcome this we added two motion features in the model named acceleration and velocity. These features were created using second order Taylor's approximation, in a window around the current frame. The model accuracy was compared with DD-net, TD-net, and state-of-the-art algorithms using three different datasets. An increase in accuracy is observed for all three datasets (i.e 1.1% for SHERC, 1.7% for FPHAB and 2% for JHMDB) when compared with TD-net.","PeriodicalId":447565,"journal":{"name":"2023 International Conference on Robotics and Automation in Industry (ICRAI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Robotics and Automation in Industry (ICRAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAI57502.2023.10089577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Various action recognition systems have been proposed, but most of them are not feasible to be used in real-time applications. Skeleton-based action recognition has a low computational cost and is not affected by background changes. As the pose estimation models are becoming faster (almost real-time), a model was created with only 1.8M parameters named DD-net, which uses the skeleton information to predict the action. Recently an improved version of the model came out and was named TD-net. The model is very rich with geometric-based features but lacks motion-based features. To overcome this we added two motion features in the model named acceleration and velocity. These features were created using second order Taylor's approximation, in a window around the current frame. The model accuracy was compared with DD-net, TD-net, and state-of-the-art algorithms using three different datasets. An increase in accuracy is observed for all three datasets (i.e 1.1% for SHERC, 1.7% for FPHAB and 2% for JHMDB) when compared with TD-net.