NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels

ACM Multimedia Asia Pub Date : 2021-10-13 DOI:10.1145/3469877.3490580

Mohit Sharma, Rajkumar Patra, Harshali Desai, Shruti Vyas, Y. Rawat, R. Shah

引用次数: 2

Abstract

Deep learning has shown remarkable progress in a wide range of problems. However, efficient training of such models requires large-scale datasets, and getting annotations for such datasets can be challenging and costly. In this work, we explore user-generated freely available labels from web videos for video understanding. We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information. We utilize the collected dataset for action classification and demonstrate its usefulness with existing small-scale annotated datasets, UCF101 and HMDB51. We study different loss functions and two pretraining strategies, simple and self-supervised learning. We also show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets. We present this as a benchmark dataset in noisy learning for video understanding. The dataset, code, and trained models are publicly available here for future research. A longer version of our paper is also available here.

查看原文本刊更多论文

NoisyActions2M:一个基于噪声标签的视频理解多媒体数据集

深度学习在许多问题上都取得了显著的进步。然而，这些模型的有效训练需要大规模的数据集，并且为这些数据集获得注释可能是具有挑战性和昂贵的。在这项工作中，我们从网络视频中探索用户生成的免费标签，用于视频理解。我们创建了一个由大约200万个视频组成的基准数据集，其中包含相关的用户生成的注释和其他元信息。我们利用收集到的数据集进行动作分类，并证明其与现有的小规模注释数据集UCF101和HMDB51的有效性。我们研究了不同的损失函数和两种预训练策略:简单学习和自监督学习。我们还展示了在提议的数据集上预训练的网络如何帮助防止下游数据集中的视频损坏和标签噪声。我们将其作为视频理解的噪声学习的基准数据集。数据集、代码和训练过的模型都可以在这里公开获取，以供将来的研究使用。我们论文的较长版本也可以在这里找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Multimedia Asia

自引率

0.00%

发文量