Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2023-06-01 DOI:10.1109/CVPR52729.2023.01416

Mengyuan Chen, Junyu Gao, Changsheng Xu

{"title":"Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization","authors":"Mengyuan Chen, Junyu Gao, Changsheng Xu","doi":"10.1109/CVPR52729.2023.01416","DOIUrl":null,"url":null,"abstract":"Targeting at recognizing and localizing action instances with only video-level labels during training, Weakly-supervised Temporal Action Localization (WTAL) has achieved significant progress in recent years. However, living in the dynamically changing open world where unknown actions constantly spring up, the closed-set assumption of existing WTAL methods is invalid. Compared with traditional open-set recognition tasks, Open-world WTAL (OW-TAL) is challenging since not only are the annotations of unknown samples unavailable, but also the fine-grained annotations of known action instances can only be inferred ambiguously from the video category labels. To address this problem, we propose a Cascade Evidential Learning framework at an evidence level, which targets at OWTAL for the first time. Our method jointly leverages multi-scale temporal contexts and knowledge-guided prototype information to progressively collect cascade and enhanced evidence for known action, unknown action, and background separation. Extensive experiments conducted on THUMOS-14 and ActivityNet-v1.3 verify the effectiveness of our method. Besides the classification metrics adopted by previous open-set recognition methods, we also evaluate our method on localization metrics which are more reasonable for OWTAL.","PeriodicalId":376416,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"238 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52729.2023.01416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Targeting at recognizing and localizing action instances with only video-level labels during training, Weakly-supervised Temporal Action Localization (WTAL) has achieved significant progress in recent years. However, living in the dynamically changing open world where unknown actions constantly spring up, the closed-set assumption of existing WTAL methods is invalid. Compared with traditional open-set recognition tasks, Open-world WTAL (OW-TAL) is challenging since not only are the annotations of unknown samples unavailable, but also the fine-grained annotations of known action instances can only be inferred ambiguously from the video category labels. To address this problem, we propose a Cascade Evidential Learning framework at an evidence level, which targets at OWTAL for the first time. Our method jointly leverages multi-scale temporal contexts and knowledge-guided prototype information to progressively collect cascade and enhanced evidence for known action, unknown action, and background separation. Extensive experiments conducted on THUMOS-14 and ActivityNet-v1.3 verify the effectiveness of our method. Besides the classification metrics adopted by previous open-set recognition methods, we also evaluate our method on localization metrics which are more reasonable for OWTAL.

查看原文本刊更多论文

开放世界弱监督时间动作定位的级联证据学习

针对训练过程中仅使用视频级标签识别和定位动作实例的问题，近年来弱监督时态动作定位(WTAL)方法取得了重大进展。然而，在动态变化的开放世界中，未知行为不断涌现，现有WTAL方法的闭集假设是无效的。与传统的开放集识别任务相比，开放世界WTAL (low - tal)具有挑战性，因为它不仅无法获得未知样本的注释，而且已知动作实例的细粒度注释只能从视频类别标签中模糊地推断出来。为了解决这个问题，我们提出了一个证据层面的级联证据学习框架，该框架首次针对OWTAL。我们的方法联合利用多尺度时间背景和知识引导的原型信息，逐步收集已知动作、未知动作和背景分离的级联和增强证据。在THUMOS-14和ActivityNet-v1.3上进行的大量实验验证了我们方法的有效性。除了采用以往的开集识别方法所采用的分类指标外，我们还对我们的方法进行了定位指标的评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量