{"title":"基于深度学习的视频动作识别研究综述","authors":"Ping Gong, Xudong Luo","doi":"10.1016/j.knosys.2025.113594","DOIUrl":null,"url":null,"abstract":"<div><div>Video Action Recognition (VAR) involves identifying and classifying human actions from video data. Deep Learning (DL) has revolutionised VAR, significantly enhancing its accuracy and efficiency. However, large-scale practical applications of VAR using DL remain limited, underscoring the need for further research and innovation. Thus, this survey provides a comprehensive overview of recent advancements in DL-based VAR. Specifically, we summarise the key DL architectures for VAR, including two-stream networks, 3D-CNNs, RNNs, LSTMs, and Attention Mechanisms, and analyse their strengths, limitations, and benchmark performances. The survey also explores the diverse applications of DL-based VAR, such as surveillance, human–computer interaction, sports analytics, healthcare, and education, while presenting a detailed summary of commonly used datasets and evaluation metrics. Moreover, critical challenges, such as computational demands and the need for robust temporal modelling, are identified, along with potential future directions. This paper is a valuable resource for researchers and practitioners striving to advance VAR using DL techniques by systematically presenting concepts, methodologies, and trends.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"320 ","pages":"Article 113594"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Survey of Video Action Recognition Based on Deep Learning\",\"authors\":\"Ping Gong, Xudong Luo\",\"doi\":\"10.1016/j.knosys.2025.113594\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Video Action Recognition (VAR) involves identifying and classifying human actions from video data. Deep Learning (DL) has revolutionised VAR, significantly enhancing its accuracy and efficiency. However, large-scale practical applications of VAR using DL remain limited, underscoring the need for further research and innovation. Thus, this survey provides a comprehensive overview of recent advancements in DL-based VAR. Specifically, we summarise the key DL architectures for VAR, including two-stream networks, 3D-CNNs, RNNs, LSTMs, and Attention Mechanisms, and analyse their strengths, limitations, and benchmark performances. The survey also explores the diverse applications of DL-based VAR, such as surveillance, human–computer interaction, sports analytics, healthcare, and education, while presenting a detailed summary of commonly used datasets and evaluation metrics. Moreover, critical challenges, such as computational demands and the need for robust temporal modelling, are identified, along with potential future directions. This paper is a valuable resource for researchers and practitioners striving to advance VAR using DL techniques by systematically presenting concepts, methodologies, and trends.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"320 \",\"pages\":\"Article 113594\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125006409\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125006409","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A Survey of Video Action Recognition Based on Deep Learning
Video Action Recognition (VAR) involves identifying and classifying human actions from video data. Deep Learning (DL) has revolutionised VAR, significantly enhancing its accuracy and efficiency. However, large-scale practical applications of VAR using DL remain limited, underscoring the need for further research and innovation. Thus, this survey provides a comprehensive overview of recent advancements in DL-based VAR. Specifically, we summarise the key DL architectures for VAR, including two-stream networks, 3D-CNNs, RNNs, LSTMs, and Attention Mechanisms, and analyse their strengths, limitations, and benchmark performances. The survey also explores the diverse applications of DL-based VAR, such as surveillance, human–computer interaction, sports analytics, healthcare, and education, while presenting a detailed summary of commonly used datasets and evaluation metrics. Moreover, critical challenges, such as computational demands and the need for robust temporal modelling, are identified, along with potential future directions. This paper is a valuable resource for researchers and practitioners striving to advance VAR using DL techniques by systematically presenting concepts, methodologies, and trends.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.