基于ResNeXt的动态场景识别时空残差网络

2021 International Conference on Electronics, Circuits and Information Engineering (ECIE) Pub Date : 2021-01-01 DOI:10.1109/ECIE52353.2021.00071

Xianqiang Xiong, Yu Sun

{"title":"基于ResNeXt的动态场景识别时空残差网络","authors":"Xianqiang Xiong, Yu Sun","doi":"10.1109/ECIE52353.2021.00071","DOIUrl":null,"url":null,"abstract":"Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.","PeriodicalId":219763,"journal":{"name":"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal Residual Network for Dynamic Scene Recognition based on ResNeXt\",\"authors\":\"Xianqiang Xiong, Yu Sun\",\"doi\":\"10.1109/ECIE52353.2021.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.\",\"PeriodicalId\":219763,\"journal\":{\"name\":\"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECIE52353.2021.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECIE52353.2021.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动态场景识别是计算机视觉中的一项基本任务，主要研究通过分析不同场景的动态变化来对视频进行分类的方法。对视频中动态场景的时空信息进行建模是该任务的主要挑战。针对这一问题，本文提出了一种基于ResNeXt的时空残差网络模型。我们的模型使用二维深度卷积网络进行空间信息提取，并将ResNeXt的残差单元转换为时空，以增加网络的时间接受场，从而可以提取视频中的时间特征。此外，为了提高泛化能力和防止过拟合，我们将全局池化策略从空间扩展到时间。在YUP++数据集的静态和运动子集上，与现有的分类方法相比，分类精度都有所提高，这表明我们的优化方法可以更好地利用时空信息进行动态场景识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Spatiotemporal Residual Network for Dynamic Scene Recognition based on ResNeXt

Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)

自引率

0.00%

发文量