一种用于人体动作识别的异构双流网络

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications Pub Date : 2023-08-16 DOI:10.3233/aic-220188

Shengbin Liao, Xiaofeng Wang, Zongkai Yang

{"title":"一种用于人体动作识别的异构双流网络","authors":"Shengbin Liao, Xiaofeng Wang, Zongkai Yang","doi":"10.3233/aic-220188","DOIUrl":null,"url":null,"abstract":"The most widely used two-stream architectures and building blocks for human action recognition in videos generally consist of 2D or 3D convolution neural networks. 3D convolution can abstract motion messages between video frames, which is essential for video classification. 3D convolution neural networks usually obtain good performance compared with 2D cases, however it also increases computational cost. In this paper, we propose a heterogeneous two-stream architecture which incorporates two convolutional networks. One uses a mixed convolution network (MCN), which combines some 3D convolutions in the middle of 2D convolutions to train RGB frames, another one adopts BN-Inception network to train Optical Flow frames. Considering the redundancy of neighborhood video frames, we adopt a sparse sampling strategy to decrease the computational cost. Our architecture is trained and evaluated on the standard video actions benchmarks of HMDB51 and UCF101. Experimental results show our approach obtains the state-of-the-art performance on the datasets of HMDB51 (73.04%) and UCF101 (95.27%).","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"1 1","pages":"219-233"},"PeriodicalIF":1.4000,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A heterogeneous two-stream network for human action recognition\",\"authors\":\"Shengbin Liao, Xiaofeng Wang, Zongkai Yang\",\"doi\":\"10.3233/aic-220188\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The most widely used two-stream architectures and building blocks for human action recognition in videos generally consist of 2D or 3D convolution neural networks. 3D convolution can abstract motion messages between video frames, which is essential for video classification. 3D convolution neural networks usually obtain good performance compared with 2D cases, however it also increases computational cost. In this paper, we propose a heterogeneous two-stream architecture which incorporates two convolutional networks. One uses a mixed convolution network (MCN), which combines some 3D convolutions in the middle of 2D convolutions to train RGB frames, another one adopts BN-Inception network to train Optical Flow frames. Considering the redundancy of neighborhood video frames, we adopt a sparse sampling strategy to decrease the computational cost. Our architecture is trained and evaluated on the standard video actions benchmarks of HMDB51 and UCF101. Experimental results show our approach obtains the state-of-the-art performance on the datasets of HMDB51 (73.04%) and UCF101 (95.27%).\",\"PeriodicalId\":50835,\"journal\":{\"name\":\"AI Communications\",\"volume\":\"1 1\",\"pages\":\"219-233\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI Communications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/aic-220188\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/aic-220188","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

视频中人类动作识别最广泛使用的两流架构和构建块通常由2D或3D卷积神经网络组成。三维卷积可以提取视频帧之间的运动信息，这是视频分类的关键。与二维情况相比，三维卷积神经网络通常可以获得较好的性能，但同时也增加了计算成本。在本文中，我们提出了一个包含两个卷积网络的异构双流架构。一种是使用混合卷积网络(MCN)，在二维卷积的中间组合一些三维卷积来训练RGB帧，另一种是使用BN-Inception网络来训练光流帧。考虑到邻域视频帧的冗余性，我们采用稀疏采样策略来降低计算成本。我们的架构在HMDB51和UCF101的标准视频动作基准上进行了培训和评估。实验结果表明，我们的方法在HMDB51(73.04%)和UCF101(95.27%)数据集上获得了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A heterogeneous two-stream network for human action recognition

The most widely used two-stream architectures and building blocks for human action recognition in videos generally consist of 2D or 3D convolution neural networks. 3D convolution can abstract motion messages between video frames, which is essential for video classification. 3D convolution neural networks usually obtain good performance compared with 2D cases, however it also increases computational cost. In this paper, we propose a heterogeneous two-stream architecture which incorporates two convolutional networks. One uses a mixed convolution network (MCN), which combines some 3D convolutions in the middle of 2D convolutions to train RGB frames, another one adopts BN-Inception network to train Optical Flow frames. Considering the redundancy of neighborhood video frames, we adopt a sparse sampling strategy to decrease the computational cost. Our architecture is trained and evaluated on the standard video actions benchmarks of HMDB51 and UCF101. Experimental results show our approach obtains the state-of-the-art performance on the datasets of HMDB51 (73.04%) and UCF101 (95.27%).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AI Communications 工程技术-计算机：人工智能

CiteScore

2.30

自引率

12.50%

发文量

审稿时长

4.5 months

期刊介绍： AI Communications is a journal on artificial intelligence (AI) which has a close relationship to EurAI (European Association for Artificial Intelligence, formerly ECCAI). It covers the whole AI community: Scientific institutions as well as commercial and industrial companies. AI Communications aims to enhance contacts and information exchange between AI researchers and developers, and to provide supranational information to those concerned with AI and advanced information processing. AI Communications publishes refereed articles concerning scientific and technical AI procedures, provided they are of sufficient interest to a large readership of both scientific and practical background. In addition it contains high-level background material, both at the technical level as well as the level of opinions, policies and news.