Improved video classification method based on non-parametric attention combined with self-supervision

Xuchao Gong, Zongmin Li
{"title":"Improved video classification method based on non-parametric attention combined with self-supervision","authors":"Xuchao Gong, Zongmin Li","doi":"10.1117/12.2643038","DOIUrl":null,"url":null,"abstract":"It is worth mentioning that in the video sequence modeling, the best recognition architecture is transformer. The current popular transformer based video classification methods focus on the importance of current features in time sequence. The degree of characterization of simultaneous order is insufficient, and simple data augmentation has unstable classification effect. In this paper we proposed a method of non-parametric attention combined with self-supervised feature construction to further improve video classification. In this method, the non-parametric attention mechanism is constructed in the simultaneous order feature to fit the multi-local extreme value distribution. At the same time, in the process of model learning, the input video is randomly masked in temporal domain and spatial domain, and self-supervised information is added to effectively learn the details and classification information of video content. Experiments using kinetics400, kinetics600 and something V2 datasets show that the algorithm in this paper has better improvement in accuracy than the current optimal method.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Digital Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2643038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

It is worth mentioning that in the video sequence modeling, the best recognition architecture is transformer. The current popular transformer based video classification methods focus on the importance of current features in time sequence. The degree of characterization of simultaneous order is insufficient, and simple data augmentation has unstable classification effect. In this paper we proposed a method of non-parametric attention combined with self-supervised feature construction to further improve video classification. In this method, the non-parametric attention mechanism is constructed in the simultaneous order feature to fit the multi-local extreme value distribution. At the same time, in the process of model learning, the input video is randomly masked in temporal domain and spatial domain, and self-supervised information is added to effectively learn the details and classification information of video content. Experiments using kinetics400, kinetics600 and something V2 datasets show that the algorithm in this paper has better improvement in accuracy than the current optimal method.
基于非参数关注和自我监督的改进视频分类方法
值得一提的是,在视频序列建模中,最好的识别架构是变压器。目前流行的基于变压器的视频分类方法关注的是电流特征在时间序列中的重要性。同时顺序的表征程度不够,简单的数据扩充分类效果不稳定。本文提出了一种非参数关注与自监督特征构建相结合的方法来进一步改进视频分类。该方法在同时阶特征上构造非参数注意机制,拟合多局部极值分布。同时,在模型学习过程中,对输入视频在时域和空域进行随机掩码,并加入自监督信息,有效学习视频内容的细节和分类信息。在kinetics400、kinetics600和某些V2数据集上进行的实验表明,本文算法比目前的最优方法在精度上有更好的提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信