{"title":"MV-guided deformable convolution network for compressed video action recognition with P-frames","authors":"Yuting Mou, Ke Xu, Xinghao Jiang, Tanfeng Sun","doi":"10.1016/j.neucom.2025.130770","DOIUrl":null,"url":null,"abstract":"<div><div>Large-scale deep models have driven substantial progress in action recognition, but their heavy computation and the use of full-resolution RGB frames raise latency and privacy concerns. Compressed-domain methods reduce overhead by operating on codec outputs (I-frames, P-frames) but still rely on privacy-sensitive I-frames and incur nontrivial decoding costs. To overcome these limitations, we propose a novel P-frame only framework that (1) employs deformable convolutions to exploit the spatial sparsity of residual maps in P-frames and (2) introduces a Motion Vector–guided Deformable Convolution Network (MV-DCN) that uses motion vectors to predict adaptive sampling offsets. To transfer semantic knowledge from RGB features without decoding I-frames, we further design a Motion-Appearance Mutual Learning (MA-ML) scheme for cross domain distillation. Extensive experiments demonstrate that our model achieves competitive accuracy and speed compared to raw domain and traditional compressed domain approaches, while effectively preserving privacy by utilizing only P-frames.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"649 ","pages":"Article 130770"},"PeriodicalIF":6.5000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014420","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale deep models have driven substantial progress in action recognition, but their heavy computation and the use of full-resolution RGB frames raise latency and privacy concerns. Compressed-domain methods reduce overhead by operating on codec outputs (I-frames, P-frames) but still rely on privacy-sensitive I-frames and incur nontrivial decoding costs. To overcome these limitations, we propose a novel P-frame only framework that (1) employs deformable convolutions to exploit the spatial sparsity of residual maps in P-frames and (2) introduces a Motion Vector–guided Deformable Convolution Network (MV-DCN) that uses motion vectors to predict adaptive sampling offsets. To transfer semantic knowledge from RGB features without decoding I-frames, we further design a Motion-Appearance Mutual Learning (MA-ML) scheme for cross domain distillation. Extensive experiments demonstrate that our model achieves competitive accuracy and speed compared to raw domain and traditional compressed domain approaches, while effectively preserving privacy by utilizing only P-frames.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.