探索视频帧冗余在实例分割中有效的数据采样和注释

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2023-06-01 DOI:10.1109/CVPRW59228.2023.00333

Jihun Yoon, Min-Kook Choi

{"title":"探索视频帧冗余在实例分割中有效的数据采样和注释","authors":"Jihun Yoon, Min-Kook Choi","doi":"10.1109/CVPRW59228.2023.00333","DOIUrl":null,"url":null,"abstract":"In recent years, deep neural network architectures and learning algorithms have greatly improved the performance of computer vision tasks. However, acquiring and annotating large-scale datasets for training such models can be expensive. In this work, we explore the potential of reducing dataset sizes by leveraging redundancies in video frames, specifically for instance segmentation. To accomplish this, we investigate two sampling strategies for extracting keyframes, uniform frame sampling with adjusted stride (UFS) and adaptive frame sampling (AFS), which employs visual (Optical flow, SSIM) or semantic (feature representations) dissimilarities measured by learning free methods. In addition, we show that a simple copy-paste augmentation can bridge the big mAP gap caused by frame reduction. We train and evaluate Mask R-CNN with the BDD100K MOTS dataset and verify the potential of reducing training data by extracting keyframes in the video. With only 20% of the data, we achieve similar performance to the full dataset mAP; with only 33% of the data, we surpass it. Lastly, based on our findings, we offer practical solutions for developing effective sampling methods and data annotation strategies for instance segmentation models. Supplementary on https://github.com/jihun-yoon/EVFR.","PeriodicalId":355438,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring Video Frame Redundancies for Efficient Data Sampling and Annotation in Instance Segmentation\",\"authors\":\"Jihun Yoon, Min-Kook Choi\",\"doi\":\"10.1109/CVPRW59228.2023.00333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, deep neural network architectures and learning algorithms have greatly improved the performance of computer vision tasks. However, acquiring and annotating large-scale datasets for training such models can be expensive. In this work, we explore the potential of reducing dataset sizes by leveraging redundancies in video frames, specifically for instance segmentation. To accomplish this, we investigate two sampling strategies for extracting keyframes, uniform frame sampling with adjusted stride (UFS) and adaptive frame sampling (AFS), which employs visual (Optical flow, SSIM) or semantic (feature representations) dissimilarities measured by learning free methods. In addition, we show that a simple copy-paste augmentation can bridge the big mAP gap caused by frame reduction. We train and evaluate Mask R-CNN with the BDD100K MOTS dataset and verify the potential of reducing training data by extracting keyframes in the video. With only 20% of the data, we achieve similar performance to the full dataset mAP; with only 33% of the data, we surpass it. Lastly, based on our findings, we offer practical solutions for developing effective sampling methods and data annotation strategies for instance segmentation models. Supplementary on https://github.com/jihun-yoon/EVFR.\",\"PeriodicalId\":355438,\"journal\":{\"name\":\"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPRW59228.2023.00333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW59228.2023.00333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，深度神经网络架构和学习算法极大地提高了计算机视觉任务的性能。然而，获取和注释大规模数据集来训练这样的模型可能是昂贵的。在这项工作中，我们探索了通过利用视频帧中的冗余来减少数据集大小的潜力，特别是在实例分割方面。为了实现这一点，我们研究了提取关键帧的两种采样策略，即调整跨距的均匀帧采样(UFS)和自适应帧采样(AFS)，后者利用视觉(光流，SSIM)或语义(特征表示)的差异，通过自由学习方法测量。此外，我们还证明了一个简单的复制粘贴增强可以弥补由于帧减少而造成的大mAP差距。我们使用BDD100K MOTS数据集训练和评估Mask R-CNN，并通过提取视频中的关键帧来验证减少训练数据的潜力。仅使用20%的数据，我们就获得了与完整数据集mAP相似的性能;我们只用了33%的数据就超过了它。最后，基于我们的研究结果，我们为开发有效的采样方法和实例分割模型的数据注释策略提供了实用的解决方案。补充https://github.com/jihun-yoon/EVFR。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring Video Frame Redundancies for Efficient Data Sampling and Annotation in Instance Segmentation

In recent years, deep neural network architectures and learning algorithms have greatly improved the performance of computer vision tasks. However, acquiring and annotating large-scale datasets for training such models can be expensive. In this work, we explore the potential of reducing dataset sizes by leveraging redundancies in video frames, specifically for instance segmentation. To accomplish this, we investigate two sampling strategies for extracting keyframes, uniform frame sampling with adjusted stride (UFS) and adaptive frame sampling (AFS), which employs visual (Optical flow, SSIM) or semantic (feature representations) dissimilarities measured by learning free methods. In addition, we show that a simple copy-paste augmentation can bridge the big mAP gap caused by frame reduction. We train and evaluate Mask R-CNN with the BDD100K MOTS dataset and verify the potential of reducing training data by extracting keyframes in the video. With only 20% of the data, we achieve similar performance to the full dataset mAP; with only 33% of the data, we surpass it. Lastly, based on our findings, we offer practical solutions for developing effective sampling methods and data annotation strategies for instance segmentation models. Supplementary on https://github.com/jihun-yoon/EVFR.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量