{"title":"基于ResNeXt的动态场景识别时空残差网络","authors":"Xianqiang Xiong, Yu Sun","doi":"10.1109/ECIE52353.2021.00071","DOIUrl":null,"url":null,"abstract":"Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.","PeriodicalId":219763,"journal":{"name":"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal Residual Network for Dynamic Scene Recognition based on ResNeXt\",\"authors\":\"Xianqiang Xiong, Yu Sun\",\"doi\":\"10.1109/ECIE52353.2021.00071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.\",\"PeriodicalId\":219763,\"journal\":{\"name\":\"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECIE52353.2021.00071\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electronics, Circuits and Information Engineering (ECIE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECIE52353.2021.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Spatiotemporal Residual Network for Dynamic Scene Recognition based on ResNeXt
Dynamic scene recognition is a fundamental task in computer vision, focusing on the method of classifying videos by analyzing dynamic changes of different scenes. Modeling the spatiotemporal information of dynamic scenes in videos is the main challenge of the task. To solve this problem, this paper proposed a spatiotemporal residual network model based on ResNeXt. Our model uses the 2D deep convolutional network for spatial information extraction, and the residual units of ResNeXt are transformed to spacetime to increase the network’s temporal receptive field making it possible to extract temporal features in videos. In addition, to improve the generalization ability and prevent overfitting, we extend the global pooling strategy from spatial to temporal. On both of the static and moving subsets of the YUP++ dataset, the classification accuracies are improved compared with state-of-the-art methods, which indicates that our optimized method could make better use of spatiotemporal information for dynamic scene recognition.