NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

Mohit Sharma, Rajkumar Patra, Harshali Desai, Shruti Vyas, Y. Rawat, R. Shah
{"title":"NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels","authors":"Mohit Sharma, Rajkumar Patra, Harshali Desai, Shruti Vyas, Y. Rawat, R. Shah","doi":"10.1145/3469877.3490580","DOIUrl":null,"url":null,"abstract":"Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models are publicly available here for future research. A longer version of our paper is also available here.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490580","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models are publicly available here for future research. A longer version of our paper is also available here.
NoisyActions2M:一个基于噪声标签的视频理解多媒体数据集
深度学习在许多问题上都取得了显著的进步。然而,这些模型的有效训练需要大规模的数据集,并且为这些数据集获得注释可能是具有挑战性和昂贵的。在这项工作中,我们从网络视频中探索用户生成的免费标签,用于视频理解。我们创建了一个由大约200万个视频组成的基准数据集,其中包含相关的用户生成的注释和其他元信息。我们利用收集到的数据集进行动作分类,并证明其与现有的小规模注释数据集UCF101和HMDB51的有效性。我们研究了不同的损失函数和两种预训练策略:简单学习和自监督学习。我们还展示了在提议的数据集上预训练的网络如何帮助防止下游数据集中的视频损坏和标签噪声。我们将其作为视频理解的噪声学习的基准数据集。数据集、代码和训练过的模型都可以在这里公开获取,以供将来的研究使用。我们论文的较长版本也可以在这里找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信