New Feature-level Video Classification via Temporal Attention Model

Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild Pub Date : 2018-10-15 DOI:10.1145/3265987.3265990

Hongje Seong, Junhyuk Hyun, Suhyeon Lee, Suhan Woo, Hyunbae Chang, Euntai Kim

{"title":"New Feature-level Video Classification via Temporal Attention Model","authors":"Hongje Seong, Junhyuk Hyun, Suhyeon Lee, Suhan Woo, Hyunbae Chang, Euntai Kim","doi":"10.1145/3265987.3265990","DOIUrl":null,"url":null,"abstract":"CoVieW 2018 is a new challenge which aims at simultaneous scene and action recognition for untrimmed video [1]. In the challenge, frame-level video features extracted by pre-trained deep convolutional neural network (CNN) are provided for video-level classification. In this paper, a new approach for the video-level classification method is proposed. The proposed method focuses on the analysis in temporal domain and the temporal attention model is developed. To compensate for the differences in the lengths of various videos, temporal padding method is also developed to unify the lengths of videos. Further, data augmentation is performed to enhance some validation accuracy. Finally, for the train/validation in CoView 2018 dataset we recorded the performance of 95.53% accuracy in the scene and 87.17% accuracy in the action using temporal attention model, nonzero padding and data augmentation. The top-1 hamming score is the standard metric in the CoVieW 2018 challenge and 91.35% is obtained by the proposed method.","PeriodicalId":151362,"journal":{"name":"Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3265987.3265990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

CoVieW 2018 is a new challenge which aims at simultaneous scene and action recognition for untrimmed video [1]. In the challenge, frame-level video features extracted by pre-trained deep convolutional neural network (CNN) are provided for video-level classification. In this paper, a new approach for the video-level classification method is proposed. The proposed method focuses on the analysis in temporal domain and the temporal attention model is developed. To compensate for the differences in the lengths of various videos, temporal padding method is also developed to unify the lengths of videos. Further, data augmentation is performed to enhance some validation accuracy. Finally, for the train/validation in CoView 2018 dataset we recorded the performance of 95.53% accuracy in the scene and 87.17% accuracy in the action using temporal attention model, nonzero padding and data augmentation. The top-1 hamming score is the standard metric in the CoVieW 2018 challenge and 91.35% is obtained by the proposed method.

查看原文本刊更多论文

基于时间注意模型的特征级视频分类

CoVieW 2018是一项新的挑战，旨在对未修剪视频进行场景和动作的同时识别[1]。在挑战中，通过预训练的深度卷积神经网络(CNN)提取帧级视频特征，用于视频级分类。本文提出了一种新的视频级分类方法。该方法侧重于时间域分析，并建立了时间注意模型。为了弥补不同视频长度的差异，还提出了时间填充法来统一视频长度。此外，执行数据增强以提高某些验证准确性。最后，对于CoView 2018数据集的训练/验证，我们使用时间注意力模型、非零填充和数据增强，在场景中记录了95.53%的准确率，在动作中记录了87.17%的准确率。前1名的汉明得分是CoVieW 2018挑战的标准指标，通过本文提出的方法获得了91.35%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild

自引率

0.00%

发文量