{"title":"基于CNN的视频动作分类","authors":"Yue Luo, Boyuan Yang","doi":"10.1109/CSAIEE54046.2021.9543398","DOIUrl":null,"url":null,"abstract":"There are more and more videos appearing on the internet these years, new ways should be developed to recognize and manage them. Since video is composed of images, this work builds a CNN network to do video classification. The work uses the UCF 101 dataset, which contains 101 different categories, to train the model. Then a simple CNN network containing five layers is built with PyTorch and trained with UCF 101 dataset on GPU. The result shows that it's underfitting and its accuracy won't be improved much by changing parameters. However, adding more layers, including the dropout layer and batchnorm layer can greatly improve its accuracy. Then a C3D method is also applied to improve the accuracy. Finally, the highest accuracy reaches 69 percentage. In this work, a simple and effective way to recognize actions in a small video is developed to help people supervise and manage the video resources online.","PeriodicalId":376014,"journal":{"name":"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Video motions classification based on CNN\",\"authors\":\"Yue Luo, Boyuan Yang\",\"doi\":\"10.1109/CSAIEE54046.2021.9543398\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are more and more videos appearing on the internet these years, new ways should be developed to recognize and manage them. Since video is composed of images, this work builds a CNN network to do video classification. The work uses the UCF 101 dataset, which contains 101 different categories, to train the model. Then a simple CNN network containing five layers is built with PyTorch and trained with UCF 101 dataset on GPU. The result shows that it's underfitting and its accuracy won't be improved much by changing parameters. However, adding more layers, including the dropout layer and batchnorm layer can greatly improve its accuracy. Then a C3D method is also applied to improve the accuracy. Finally, the highest accuracy reaches 69 percentage. In this work, a simple and effective way to recognize actions in a small video is developed to help people supervise and manage the video resources online.\",\"PeriodicalId\":376014,\"journal\":{\"name\":\"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSAIEE54046.2021.9543398\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSAIEE54046.2021.9543398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
There are more and more videos appearing on the internet these years, new ways should be developed to recognize and manage them. Since video is composed of images, this work builds a CNN network to do video classification. The work uses the UCF 101 dataset, which contains 101 different categories, to train the model. Then a simple CNN network containing five layers is built with PyTorch and trained with UCF 101 dataset on GPU. The result shows that it's underfitting and its accuracy won't be improved much by changing parameters. However, adding more layers, including the dropout layer and batchnorm layer can greatly improve its accuracy. Then a C3D method is also applied to improve the accuracy. Finally, the highest accuracy reaches 69 percentage. In this work, a simple and effective way to recognize actions in a small video is developed to help people supervise and manage the video resources online.