{"title":"关注增强的视频分类深度网络","authors":"Junyong You, J. Korhonen","doi":"10.1109/ICIP40778.2020.9190996","DOIUrl":null,"url":null,"abstract":"Video classification can be performed by summarizing image contents of individual frames into one class by deep neural networks, e.g., CNN and LSTM. Human interpretation of video content is influenced by the attention mechanism. In other words, video class can be more attentively decided by certain information than others. In this paper, we propose to integrate the attention mechanism into deep networks for video classification. The proposed framework employs 2D CNN networks with ImageNet pretrained weights to extract features of video frames that are then fed to a bidirectional LSTM network for video classification. An attention block has been developed that can be added after the LSTM network in the proposed framework. Several different 2D CNN architectures have been tested in the experiments. The results with respect to two publicly available datasets have demonstrated that integrating attention can boost the performance of deep networks in video classification compared to not applying the attention block. We also found out that applying attention to the LSTM outputs on the VGG19 architecture provides the highest classification accuracy in the proposed framework.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Attention Boosted Deep Networks For Video Classification\",\"authors\":\"Junyong You, J. Korhonen\",\"doi\":\"10.1109/ICIP40778.2020.9190996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video classification can be performed by summarizing image contents of individual frames into one class by deep neural networks, e.g., CNN and LSTM. Human interpretation of video content is influenced by the attention mechanism. In other words, video class can be more attentively decided by certain information than others. In this paper, we propose to integrate the attention mechanism into deep networks for video classification. The proposed framework employs 2D CNN networks with ImageNet pretrained weights to extract features of video frames that are then fed to a bidirectional LSTM network for video classification. An attention block has been developed that can be added after the LSTM network in the proposed framework. Several different 2D CNN architectures have been tested in the experiments. The results with respect to two publicly available datasets have demonstrated that integrating attention can boost the performance of deep networks in video classification compared to not applying the attention block. We also found out that applying attention to the LSTM outputs on the VGG19 architecture provides the highest classification accuracy in the proposed framework.\",\"PeriodicalId\":405734,\"journal\":{\"name\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP40778.2020.9190996\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP40778.2020.9190996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Attention Boosted Deep Networks For Video Classification
Video classification can be performed by summarizing image contents of individual frames into one class by deep neural networks, e.g., CNN and LSTM. Human interpretation of video content is influenced by the attention mechanism. In other words, video class can be more attentively decided by certain information than others. In this paper, we propose to integrate the attention mechanism into deep networks for video classification. The proposed framework employs 2D CNN networks with ImageNet pretrained weights to extract features of video frames that are then fed to a bidirectional LSTM network for video classification. An attention block has been developed that can be added after the LSTM network in the proposed framework. Several different 2D CNN architectures have been tested in the experiments. The results with respect to two publicly available datasets have demonstrated that integrating attention can boost the performance of deep networks in video classification compared to not applying the attention block. We also found out that applying attention to the LSTM outputs on the VGG19 architecture provides the highest classification accuracy in the proposed framework.