{"title":"Multilevel semantic and adaptive actionness learning for weakly supervised temporal action localization","authors":"Zhilin Li, Zilei Wang, Cerui Dong","doi":"10.1016/j.neunet.2024.106905","DOIUrl":null,"url":null,"abstract":"<div><div>Weakly supervised temporal action localization aims to identify and localize action instances in untrimmed videos with only video-level labels. Typically, most methods are based on a multiple instance learning framework that uses a top-<span><math><mi>K</mi></math></span> strategy to select salient segments to represent the entire video. Therefore fine-grained video information cannot be learned, resulting in poor action classification and localization performance. In this paper, we propose a Multilevel <strong>S</strong>emantic and Adaptive <strong>A</strong>ctionness <strong>L</strong>earning Network (SAL), which is mainly composed of multilevel semantic learning (MSL) branch and adaptive actionness learning (AAL) branch. The MSL branch introduces second-order video semantics, which can capture fine-grained information in videos and improve video-level classification performance. Furthermore, we propagate second-order semantics to action segments to enhance the difference between different actions. The AAL branch uses pseudo labels to learn class-agnostic action information. It introduces a video segments mix-up strategy to enhance foreground generalization ability and adds an adaptive actionness mask to balance the quality and quantity of pseudo labels, thereby improving the stability of training. Extensive experiments show that SAL achieves state-of-the-art results on three benchmarks. Code: <span><span>https://github.com/lizhilin-ustc/SAL</span><svg><path></path></svg></span></div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"182 ","pages":"Article 106905"},"PeriodicalIF":6.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024008347","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Weakly supervised temporal action localization aims to identify and localize action instances in untrimmed videos with only video-level labels. Typically, most methods are based on a multiple instance learning framework that uses a top- strategy to select salient segments to represent the entire video. Therefore fine-grained video information cannot be learned, resulting in poor action classification and localization performance. In this paper, we propose a Multilevel Semantic and Adaptive Actionness Learning Network (SAL), which is mainly composed of multilevel semantic learning (MSL) branch and adaptive actionness learning (AAL) branch. The MSL branch introduces second-order video semantics, which can capture fine-grained information in videos and improve video-level classification performance. Furthermore, we propagate second-order semantics to action segments to enhance the difference between different actions. The AAL branch uses pseudo labels to learn class-agnostic action information. It introduces a video segments mix-up strategy to enhance foreground generalization ability and adds an adaptive actionness mask to balance the quality and quantity of pseudo labels, thereby improving the stability of training. Extensive experiments show that SAL achieves state-of-the-art results on three benchmarks. Code: https://github.com/lizhilin-ustc/SAL
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.