基于ResNeXt的动态场景识别时空残差网络

Xianqiang Xiong, Yu Sun
{"title":"基于ResNeXt的动态场景识别时空残差网络","authors":"Xianqiang Xiong, Yu Sun","doi":"10.1109/ECIE52353.2021.00071","DOIUrl":null,"url":null,"abstract":"Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.","PeriodicalId":219763,"journal":{"name":"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal Residual Network for Dynamic Scene Recognition based on ResNeXt\",\"authors\":\"Xianqiang Xiong, Yu Sun\",\"doi\":\"10.1109/ECIE52353.2021.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.\",\"PeriodicalId\":219763,\"journal\":{\"name\":\"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECIE52353.2021.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECIE52353.2021.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动态场景识别是计算机视觉中的一项基本任务,主要研究通过分析不同场景的动态变化来对视频进行分类的方法。对视频中动态场景的时空信息进行建模是该任务的主要挑战。针对这一问题,本文提出了一种基于ResNeXt的时空残差网络模型。我们的模型使用二维深度卷积网络进行空间信息提取,并将ResNeXt的残差单元转换为时空,以增加网络的时间接受场,从而可以提取视频中的时间特征。此外,为了提高泛化能力和防止过拟合,我们将全局池化策略从空间扩展到时间。在YUP++数据集的静态和运动子集上,与现有的分类方法相比,分类精度都有所提高,这表明我们的优化方法可以更好地利用时空信息进行动态场景识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spatiotemporal Residual Network for Dynamic Scene Recognition based on ResNeXt
Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信