{"title":"Point-supervised temporal action localisation based on multi-branch attention","authors":"Shu Liu, Yang Zhang, Gautam Srivastava","doi":"10.1080/17517575.2023.2197318","DOIUrl":null,"url":null,"abstract":"ABSTRACT Temporal action localisation is a key research direction for video understanding in the field of computer vision. Current methods of using an attention mechanism only divides the video frame into an action instance frame and a background frame. As a result, the action context, which should belong to the background is misclassified into an action instance In addition, during the training phase of using point-supervised frame-level labels, action samples and background samples are unbalanced. The lack of background samples leads to the reduction of the activation score of the background so that the imbalance of samples will affect the separation of action examples from the background. All these reduce the accuracy of action classification and temporal localisation. Therefore, this paper proposesa multi-branch attention network and a pseudo-background label generation method. Experimental results show that the proposed method can improve the separation effect of action instances, background, and action context. Moreover, the proposed model achieves excellent performance on the THUMOS-14 dataset.","PeriodicalId":11750,"journal":{"name":"Enterprise Information Systems","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Enterprise Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1080/17517575.2023.2197318","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
ABSTRACT Temporal action localisation is a key research direction for video understanding in the field of computer vision. Current methods of using an attention mechanism only divides the video frame into an action instance frame and a background frame. As a result, the action context, which should belong to the background is misclassified into an action instance In addition, during the training phase of using point-supervised frame-level labels, action samples and background samples are unbalanced. The lack of background samples leads to the reduction of the activation score of the background so that the imbalance of samples will affect the separation of action examples from the background. All these reduce the accuracy of action classification and temporal localisation. Therefore, this paper proposesa multi-branch attention network and a pseudo-background label generation method. Experimental results show that the proposed method can improve the separation effect of action instances, background, and action context. Moreover, the proposed model achieves excellent performance on the THUMOS-14 dataset.
期刊介绍:
Enterprise Information Systems (EIS) focusses on both the technical and applications aspects of EIS technology, and the complex and cross-disciplinary problems of enterprise integration that arise in integrating extended enterprises in a contemporary global supply chain environment. Techniques developed in mathematical science, computer science, manufacturing engineering, and operations management used in the design or operation of EIS will also be considered.